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(4)  Introduction 


Mammographic  screening  ha  s  b  een  re  cognized  a  s  the  most  effective  method  for  early 
detection  of  breast  cancer1"4.  Studies  indicate  that  radiologists  do  not  detect  all  carcinomas  that 
are  visible  upon  retrospective  analyses  of  the  images5"10.  Various  methods  are  being  developed  to 
improve  the  sensitivity  and  specificity  of  breast  cancer  detection.  Double  reading  can  reduce  the 
miss  rate  of  radiographic  reading.  However,  double-reading  by  radiologists  is  costly.  Computer- 
aided  diagnosis  (CAD)  is  considered  to  be  one  of  the  promising  approaches  that  may  improve 
the  efficacy  o  f  ma  mmography  ’  .  It  has  been  shown  that  CAD  can  im  prove  ra  diologists’ 

detection  accuracy  significantly  "  .  Our  receiver  operating  characteristic  (ROC)  study  ’  and 
that  by  Jiang  et  al.17’ 18  also  showed  that  computer  classifiers  can  improve  radiologists’  ability  in 
differentiating  malignant  and  benign  masses  or  microcalcifications.  CAD  is  thus  a  viable  cost- 
effective  alternative  to  double  reading  by  radiologists. 

Most  of  th  e  CA  D  systems  developed  s  o  far  are  has  ed  on  ra  diologists’  markers  o  n 
mammograms  whic  h  were  proved  to  be  cancer  w  ith  bio  psy.  Some  researchers19"23  ha  ve 
investigated  the  performance  change  of  CAD  systems  when  using  prior  mammograms  (i.e.,  the 
mammograms  in  previous  exams  on  which  the  cancer  can  be  seen  retrospectively  but  was  called 
negative  or  probably  benign  at  the  time  of  the  exam).  The  ability  of  a  CAD  s  ystem  to  detect 
these  cancers  is  important  because  it  sign  ifies  early  detection  of  cancers  that  radiologists  may 
overlook.  On  the  other  hand,  when  a  CAD  system  is  applied  to  a  new  mammogram  in  clinical 
practice,  it  has  to  de  tect  bre  ast  lesions  of  all  d  egrees  of  subtlety  effectively.  Our  e  xperiences 
indicate  that  it  i  s  difficult  to  trai  n  a  singl  e  CAD  system  t  o  prov  ide  optimal  det  ection  for  all 
lesions  over  the  entire  spectrum  of  subtlety  because  the  classifiers  have  to  make  compromises  to 
accommodate  cancers  of  a  wide  range  of  characteristics. 

The  goal  of  this  proposed  project  is  to  develop  a  CAD  system  using  advanced  computer 
vision  te  chniques  aim  ing  at  im  proved  det  ection  of  retros  pectively  see  n  cancers  on  prior 
mammograms  and  incorporate  the  developed  CAD  system  into  our  current  CAD  system.  We 
hypothesize  t  hat  a  du  al  CAD  system,  which  combines  a  sy  stem  tr  ained  with  subtle  lesions 
retrospectively  seen  on  pr  ior  m  ammograms  an  d  a  s  ystem  tra  ined  with  cancers  de  tected  on 
current  ma  mmograms,  shou  Id  increase  th  e  se  nsitivity  of  d  etecting  can  cers  at  the  early  stag  e 
without  compromising  its  ability  to  detect  less  subtle  cancers.  To  accomplish  this  goal,  we  have 
performed  the  following  tasks:  (1)  collection  of  a  large  database  of  masses  on  digitized  prior  and 
current  film  mammograms  (DFMs)  for  training  and  testing  the  CAD  system,  (2)  development  of 
single-view  computer  vision  techniques  for  mass  detection  and  classification  in  prior  DFMs,  (3) 
reduction  of  false  positives  (FPs)  by  c  orrelation  of  i  mage  i  nformation  fro  m  multiple-view 
mammograms,  (4)  development  of  a  dual  system  scheme  which  combines  the  new  CAD  system 
with  our  current  CAD  system  without  an  increase  in  overall  FPs,  and  (5)  evaluation  of  the  effects 
of  the  developed  CAD  with  a  large  data  set  in  detecting  both  average  and  subtle  cancers. 

It  is  expected  that  we  will  develop  a  fully  automated  CAD  system  which  can  be  used  for 
detection  of  masses  on  DFMs.  Although  we  do  not  plan  to  develop  such  a  s  ystem  for  digital 
mammograms  because  there  will  not  be  enough  prior  digital  mammograms  with  missed  cancers 
available  for  the  development,  the  general  methodology  developed  in  this  study  can  be  adapted 
to  CAD  systems  for  digital  mammograms  in  the  future.  The  significance  of  this  project  is  that 
have  a  ccomplished  t  he  goal  of  de  veloping  a  CAD  system  whic  h  ca  n  further  im  prove 
radiologists’  accuracy  in  detecting  breast  cancers  at  a  n  early  stage.  Sine  e  early  detection  and 
treatment  can  reduce  breast  cancer  mortality  rate,  the  CAD  system  will  be  useful  for  increasing 
the  effectiveness  of  mammographic  screening. 
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(5)  Body 


This  is  the  final  report  of  this  project.  We  have  described  in  detail  the  results  of  our 
studies  in  the  p ast  annu al  progress  reports.  The  in vestigations  conducted  in  this  project  are 
summarized  in  the  following. 

(5.A)  Collection  of  a  data  set  of  mammograms  (Task  I) 

With  IRB  approval,  we  ha  ve  collected  a  d  atabase  of  digitized  screen-film  mammogram 
(DFM)  from  patient  files  in  the  Department  of  Radiology  at  the  University  of  Michigan.  In  this 
study,  we  collected  the  mass  data  set  c  ontained  220  cases  with  masses.  Each  case  included  the 
current  mammograms  o  n  wh  ich  th  e  m  ass  was  d  etected  by  r  adiologists,  and  th  e  prior 
mammograms  obtained  from  previous  exams.  The  mass  set  contained  440  current  mammograms 
and  496  prior  mammograms.  The  true  location  of  each  mass  was  identified  by  an  experienced 
Mammography  Quality  Standards  Act  (MQSA)  rad  iologist.  Th  e  radiologist  also  measured  the 
mass  size  and  provided  descriptions  of  the  mass  margin,  shape,  conspicuity,  and  breast  density. 

(5.B)  Development  of  single-view  computer  vision  techniques  for  mass  detection  and 

classification  on  prior  mammograms  (Task  II) 

In  this  p  reject,  w  e  h  ave  de  veloped  a  series  of  com  puter  vis  ion  techniques  for  mass 
detection  on  sin gle-view  mammograms.  We  have  newly  developed  a  two-stage  gradient  field 
analysis  method  which  uses  not  only  the  shape  information  of  masses  on  mammograms  but  also 
incorporates  a  second  stage  in  which  the  gray  level  information  of  the  local  object  segmented  by 
a  region  growing  technique  is  refined  by  gradient  field  analysis.  In  comparing  with  spatial  gray 
level  dependence  (SGLD)  texture  features  extracted  from  current  mammograms,  we  e  xtracted 
gray  level  features  and  run  length  statistics  analysis  (RLS)  texture  features  inside  and  outside  of 
the  mass  region  on  both  the  original  image  and  gradient  field  image  from  prior  mammograms. 
In  CAD  applications,  an  important  step  i  s  to  design  a  c lassifier  f or  t he  differentiation  of  the 
abnormal  from  the  normal  structures.  In  this  project,  we  have  also  investigated  the  performance 
of  a  regularized  discriminant  analysis  (RDA)  classifier  in  combination  with  a  feature  selection 
method  for  classification  of  the  masses  and  normal  tissues  detected  on  mammograms.  We  have 
applied  th  ese  c  omputer  vision  tech  niques  to  m  ass  det  ection  on  both  full  field  di  gital 
mammograms  (FFDM)  a  nd  DFMs.  We  found  that  they  were  very  useful  and  consistent  for 
improving  the  accuracy  of  mass  detection  on  both  FFDMs  and  DFMs.  (Publications:  J1 ,  J3,  PI, 
P4,  P5) 

(5.C)  Reduction  of  FPs  by  correlation  of  image  information  from  multiple  view 

mammograms  (Task  III) 

In  mannnographic  screening,  a  craniocaudal  (CC)  and  a  me  diolateral  oblique  (MLO)  or 
lateral  (LAT)  view  are  generally  taken  for  each  breast.  The  two  views  not  only  allow  most  of  the 
breast  tissue  to  be  imaged  but  also  improve  the  chance  that  a  lesion  will  be  seen  in  at  least  one  of 
the  views.  The  radiologist  uses  the  two  views  to  confirm  true  positives  (TPs)  and  to  reduce  false 
positives  (FPs).  In  an  effort  to  improve  the  performance  of  our  single  CAD  system,  we  have 
conducted  sev  eral  stu  dies  by  using  m  ultiple  view  information  of  th  e  sam  e  patient.  We  first 
developed  a  two-view  information  fusion  m ethod  which  combines  the  inf onnation  from  two 
mannnographic  views  of  the  same  breast.  Then,  we  investigated  an  FP  reduction  method  based 
on  analysis  of  bilateral  mammograms.  We  have  also  developed  a  four-view  CAD  system  to 
improve  the  performance  of  mass  detection  for  our  computerized  detection  system.  We  found 
that  our  multiple  view  CAD  sy  stem  significantly  improved  the  accuracy  for  mass  d  etection  on 
mammograms.  (Publications  J4,  J5,  P3,  A3,  A4,  A7) 
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(5.D)  Development  of  a  information  fusion  scheme  to  combine  the  new  CAD  system 

with  the  existing  CAD  system  for  mass  detection  (Task  IV) 

In  this  project,  we  have  developed  a  dual  system  scheme  which  combined  a  CAD  system 
optimized  with  “ average”  masses  with  another  CAD  sy  stem  o ptimized  with  “subtle”  masses. 
The  two  single  CAD  systems  for  mass  detection  have  similar  image  processing  steps  and  were 
trained:  one  with  the  current  mammograms  and  the  other  with  the  prior  mammograms.  A  feed¬ 
forward  backpropagation  artificial  neural  network  (BP-ANN)  was  trained  to  classify  the  masses 
from  normal  tissues  by  combining  the  output  information  from  the  two  single  CAD  systems.  In 
this  ANN,  the  nodes  are  organized  in  an  input  layer,  an  output  layer,  and  one  hidden  layers.  The 
two  linear  discriminant  analysis  (FDA)  scores  from  the  two  CAD  systems  were  used  as  input  to 
the  BP-ANN.  The  BP-ANN  has  two  input  nodes,  a  single  hidden  layer  with  3  hidden  nodes,  and 
one  output  node.  The  nodes  are  interconnected  by  weights  and  information  propagates  from  one 
layer  to  the  next  through  a  log-sigmoidal  transform  function.  The  learning  of  the  ANN  is  a 
supervised  process  in  which  known  training  cas  es  are  in  put  to  the  ANN.  The  performance 
function  for  the  feedforward  network  was  the  mean-square  error  which  was  the  average  squared 
error  between  the  network  outputs  and  the  target  values  over  all  training  samples.  The  gradient 
of  the  performance  function  was  used  to  det  ermine  how  to  adj  ust  the  weights  to  minimize  the 
error.  The  gradient  is  determined  using  an  iterative  backpropagation  procedure  which  involves 
performing  computations  backward  through  the  network.  We  found  that  the  ANN  fusion  scheme 
can  pro  vide  si  gnificant  improvement  in  the  accuracy  o  f  t  he  mass  detection  CAD  system  in 
comparison  with  that  of  a  single  CAD  system.  (Publications  J2,  P2,  Al,  A2,  A5,  A6) 

(5.E)  Evaluation  of  the  proposed  CAD  system  with  a  large  data  set  (Task  V) 

The  detection  of  masses  on  mammograms  is  a  challenging  task  because  the  overlapping 
fibroglandular  t  issue  rna  y  m  imic  a  mass  o  r  ob  scures  the  lesion.  Alth  ough  researchers  h  ave 
devoted  e  xtensive  efforts  to  t  he  development  o  f  CAD  s  ystems  for  m  ass  d  etection,  the 
performances  of  current  CAD  systems  are  far  from  ideal.  We  have  developed  a  dual  system 
approach  and  a  four-view  analysis  method.  In  the  end  of  this  project,  we  have  combined  the  dual 
system  approach  with  the  four-view  approach  and  collected  a  relatively  large  data  set  to  evaluate 
the  effectiveness  o  f  our  four-view  dual  CAD  system  .  Besides  t  he  data  set  co  llected  in  this 
project,  we  also  included  369  patients  collected  by  our  previous  projects.  In  total,  we  used  589 
patients  including  a  mass  set  with  389  patients  and  a  normal  set  with  200  patients  in  this  study. 
Each  patient  had  two  views  (CC  and  MLO/LAT)  for  each  breast.  The  overall  test  performance 
was  assessed  by  the  free  response  receiver  operating  characteristic  (FROC)  curves.  It  was  found 
that  our  the  four-view  dual  CAD  system  achieved  an  FP  rate  of  1 .04,  0.80,  and  0.60  FPs/image  at 
the  case-based  sensitivities  of  90%,  85%  and  80%,  respectively,  which  represents  a  statistically 
significant  improvement  ov  er  the  conv  entional  singl  e-view  detection  a  pproach,  using  the 
Jackknife  alternative  FROC  (JAFROC)  method..  (Publications  J5,  A7) 


(6)  Key  Research  Accomplishments 

•  Collected  220  cases  DFMs  with  93  6  m  ammograms  fo  r  tra  ining  an  d  testing  the 
computerized  detection  systems  (Task  I). 

•  Developed  a  series  of  computer  vision  techniques  for  mass  detection  and  classification  on 
single- view  mammograms  (Task  II). 

•  Developed  a  four-view  CAD  system  for  mass  detection  (Task  III). 

•  Developed  a  dual  system  approach  for  mass  detection  (Task  IV). 
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Combined  th  e  du  al  system  approach  with  four- view  app  roach  and  ev  aluated  t  he 
combined  system  with  a  relatively  large  data  set  (Task  V). 


(7)  Reportable  Outcomes 

As  a  res  ult  of  the  support  by  the  USAMRMC  BCRP  grant,  we  have  developed  a  dual 

system  approach  with  four-view  analysis  which  significantly  improved  the  performance  of  mass 

detection  on  mammograms  in  comparison  with  the  conventional  single-view  detection  approach. 

The  publications  from  this  project  are  lis  ted  below.  Many  of  these  have  been  reported  in  the 

previous  annual  reports. 

Journal  Articles: 

J1 .  Wei,  J.,  Sahiner,  B.,  Hadjiiski,  L.  M.,  Chan,  H.  P.,  Petrick,  N.,  Helvie,  M.  A.,  Roubidoux, 
M.  A .,  G  e,  J.  and  Zhou ,  C. ,  "  Computer  aided  detection  of  b  reast  masses  on  full  field 
digital  mammograms,"  Medical  Physics  32,  2827-2838  (2005). 

J2.  Wei,  J.,  Cha  n,  H  ,-P.,  Sahiner,  B. ,  Hadjiiski,  L .  M. ,  Helvie,  M.  A.,  Roubidoux,  M.  A ., 
Zhou,  C.  and  Ge,  J.,  "Dual  system  approach  to  computer-aided  detection  of  breast  masses 
on  mammograms,"  Medical  Physics  33,  4157-4168  (2006). 

J3.  Wei,  J.,  Hadjiiski,  L.  M.,  Sahiner,  B.,  Chan,  H.  P.,  Ge,  J.,  Roubidoux,  M.  A.,  Helvie,  M. 
A.,  Zhou  ,  C.,  Wu  ,  Y.  T.,  Paramagul,  C.  and  Zhang,  Y.,  "Co  mputer  aid  ed  detection 
systems  f  or  br  east  masses:  Com  parison  o  f  pe  rformances  on  full-field  d  igital 
mammograms  and  digitized  screen-film  mammograms,"  Academic  Radiology  6,  659-669 
(2007). 

J4.  Wu,  Y.-T.,  Wei,  J.,  Hadjiiski,  L.  M.,  Sahiner,  B.,  Zhou,  C.,  Ge,  J.,  Shi,  J.,  Zhang,  Y.  and 
Chan,  H.  P.,  "Bilateral  analysis  based  fals  e  positive  reduct  ion  for  computer-aided  mass 
detection,"  Medical  Physics  34,  3334-3344  (2007). 

J5.  Wei,  J.,  Chan,  H.-P.,  S  ahiner,  B.,  Zhou,  C.,  Hadjiiski  ,  L.  M.,  Roubidoux,  M.  A.  and 
Helvie,  M.  A.  ,  "C  omputer-aided  detection  of  b  reast  m  asses  on  m  ammograms:  D  ual 
system  approach  with  two-view  analysis,"  Medical  Physics  (Accepted). 

Conference  Proceeding: 

PI.  Wei,  J.,  Sahiner,  B.,  Hadjiiski,  L.  M.,  Chan,  H.-P.,  Petri  ck,  N.,  Helvie,  M.  A.,  Zhou,  C. 
and  Ge,  Z  .,  "  Computer  ai  ded  detection  ofbreas  t  masses  on  full-field  digital 
mammograms:  False  positive  reduction  using  gradient  field  analysis,"  Proc.  SPIE  5370, 
992-998  (2004). 

P2.  Wei,  J.,  Sahiner,  B.,  Hadjiiski,  L.  M.,  Chan,  H.  P.,  A,  H.  M.,  A,  R.  M.,  Petrick,  N.,  Zhou, 
C.  a  nd  Ge,  J.  ,  "C  omputer  aided  de  tection  o  fbr  east  masses  o  nm  ammograms: 
Performance  improvement  using  a  dual  system,"  Proc.  SPIE  5747,  9-15  (2005). 

P3.  Wei,  J.,  Sahiner,  B.,  Hadjiiski,  L.  M.,  Chan,  H.-P.,  Helvie,  M.  A.,  Roubidoux,  M.  A., 
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(8)  Conclusions 

In  this  project,  we  have  developed  a  dual  CAD  system,  which  combined  a  CAD  system 
optimized  with  “average”  masses  with  another  CAD  system  optimized  with  “subtle”  masses,  for 
mass  detection  on  mammograms.  Our  studies  showed  that  the  improvement  in  the  FROC  curves 


by  the  du  al  system  approach  was  stati  stically  sign  ificant  (p<0.05)  for  the  dete  ction  o  f  both 
average  masses  and  sub  tie  masses  using  JAFROC  m  ethod.  In  addition,  we  have  developed  a 
four-view  approach  to  improve  computerized  detection  of  breast  masses  on  mammograms.  We 
have  evaluated  our  approach  by  using  a  relatively  large  data  set.  Our  results  indicate  that  the 
proposed  approach  is  a  ble  to  fu  rther  improve  the  detection  performance  as  est  imated  by  the 
JAFROC  analysis.  The  significance  of  this  project  is  that  the  newly  developed  CAD  system  may 
be  able  to  aid  radiologists  in  detecting  breast  cancers  at  an  early  stage.  Since  early  detection  and 
treatment  c  an  re  duce  breast  ca  ncer  m  ortality  r  ate  a  nd  health  care  c  osts,  the  proposed  C  AD 
system  will  improve  the  efficacy  of  mammography  for  breast  cancer  screening. 
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We  are  developing  a  computer-aided  detection  (CAD)  system  for  breast  masses  on  full  field  digital 
mammographic  (FFDM)  images.  To  develop  a  CAD  system  that  is  independent  of  the  FFDM 
manufacturer’s  proprietary  preprocessing  methods,  we  used  the  raw  FFDM  image  as  input  and 
developed  a  multiresolution  preprocessing  scheme  for  image  enhancement.  A  two-stage  prescreen¬ 
ing  method  that  combines  gradient  field  analysis  with  gray  level  information  was  developed  to 
identify  mass  candidates  on  the  processed  images.  The  suspicious  structure  in  each  identified  region 
was  extracted  by  clustering-based  region  growing.  Morphological  and  spatial  gray-level  depen¬ 
dence  texture  features  were  extracted  for  each  suspicious  object.  Stepwise  linear  discriminant 
analysis  (LDA)  with  simplex  optimization  was  used  to  select  the  most  useful  features.  Finally, 
rule-based  and  LDA  classifiers  were  designed  to  differentiate  masses  from  normal  tissues.  Two  data 
sets  were  collected:  a  mass  data  set  containing  110  cases  of  two-view  mammograms  with  a  total  of 
220  images,  and  a  no-mass  data  set  containing  90  cases  of  two-view  mammograms  with  a  total  of 
180  images.  All  cases  were  acquired  with  a  GE  Senographe  2000D  FFDM  system.  The  true 
locations  of  the  masses  were  identified  by  an  experienced  radiologist.  Free-response  receiver  oper¬ 
ating  characteristic  analysis  was  used  to  evaluate  the  performance  of  the  CAD  system.  It  was  found 
that  our  CAD  system  achieved  a  case-based  sensitivity  of  70%,  80%,  and  90%  at  0.72,  1.08,  and 
1.82  false  positive  (FP)  marks/image  on  the  mass  data  set.  The  FP  rates  on  the  no-mass  data  set 
were  0.85,  1.31,  and  2.14  FP  marks/image,  respectively,  at  the  corresponding  sensitivities.  This 
study  demonstrated  the  usefulness  of  our  CAD  techniques  for  automated  detection  of  masses  on 
FFDM  images.  ©  2005  American  Association  of  Physicists  in  Medicine. 
[DOI:  10.1118/1.1997327] 

Key  words:  computer-aided  detection,  full  field  digital  mammogram  (FFDM),  multiresolution  im¬ 
age  enhancement,  gradient  field  analysis,  stepwise  linear  discriminant  analysis 


I.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  death  among 
American  women  between  40  and  55  years  of  age.1  It  has 
been  reported  that  early  diagnosis  and  treatment  can  signifi¬ 
cantly  improve  the  chance  of  survival  for  patients  with  breast 
cancer.2-4  Although  mammography  is  the  best  available 
screening  tool  for  detection  of  breast  cancers,  studies  indi¬ 
cate  that  a  substantial  fraction  of  breast  cancers  that  are  vis¬ 
ible  upon  retrospective  analyses  of  the  images  are  not  de¬ 
tected  initially.5-8  Computer-aided  diagnosis  (CAD)  is 
considered  to  be  one  of  the  promising  approaches  that  may 
improve  the  sensitivity  of  mammography.9'10  Computer- 
aided  lesion  detection  can  be  used  during  screening  to  reduce 
oversight  of  suspicious  lesions  that  warrant  further  work-up. 
Computer-aided  lesion  characterization  can  assist  in  the  esti¬ 
mation  of  the  likelihood  of  malignancy  of  lesions  by  using 
image  and/or  other  information  during  the  diagnostic  stage. 
The  majority  of  studies  to  date  show  that  CAD  can  improve 
radiologists’  lesion  detection  sensitivity,11-16  although  Gur  et 
al.  found  that  CAD  had  no  significant  effect  on  the  radi¬ 
ologists  in  their  academic  setting  when  they  averaged  the 
results  from  both  low-volume  and  high-volume  radiologists. 
Further  analysis  of  Gur’s  data  by  Feig  et  al.  indicated  that 


the  17  low-volume  radiologists  in  Gur’s  study  achieved  simi¬ 
lar  increase  in  sensitivity  as  reported  in  other  studies.  The 
outcome  of  CAD  studies  therefore  depends  on  the  study  de¬ 
sign  and  data  analysis. 

A  number  of  investigators  have  reported  CAD  algorithms 
for  detection  of  masses  on  mammograms.  Their  approaches 
to  prescreening  of  mass  candidates  were  based  primarily  on 
mass  characteristics  including:  (1)  asymmetric  density  be- 
tween  left  and  right  mammograms,  (2)  texture,  ’  (3) 

spiculation,25,26  (4)  gray  level  contrast,27-31  and  (5) 
gradient.  “  Some  of  these  approaches  were  refined  with  a 
combination  of  the  mass  characteristics.  Feature  classifiers 
were  then  used  to  further  differentiate  masses  from  normal 
breast  tissues. 

Most  mammographic  CAD  algorithms  developed  so  far 
are  based  on  digitized  screen-film  mammograms  (SFMs).  In 
the  last  few  years,  full  field  digital  mammographic  (FFDM) 
technology  has  advanced  rapidly  because  of  the  potential  of 
digital  imaging  to  improve  breast  cancer  detection.  Several 
manufacturers  have  obtained  clearance  from  the  FDA  for 
clinical  use.  It  is  expected  that  FFDM  detectors  will  provide 
higher  signal-to-noise  ratio  (SNR)  and  detective  quantum  ef¬ 
ficiency,  wider  dynamic  range,  and  higher  contrast  sensitivity 
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than  digitized  mammograms.  The  spatial  resolution  of  digital 
detectors  may  also  be  different  from  that  of  digitized  SFMs 
even  when  their  pixel  pitches  are  equal.  Li  el  al.  investigated 
the  performance  of  their  CAD  system  on  mass  detection  that 
was  developed  for  SFMs  and  modified  for  FFDMs.  Their 
preliminary  results  on  a  small  data  set  showed  that  it 
achieved  60%  sensitivity  at  2.47  false  positives  (FPs)/image. 
It  is  expected  that  proper  adaptation  based  on  the  imaging 
characteristics  of  FFDMs  and  re-training  of  the  CAD  system 
with  FFDMs  would  improve  the  performance.  Because  of 
the  higher  SNR  and  linear  response  of  digital  detectors,  there 
is  also  a  strong  potential  that  more  effective  feature  extrac¬ 
tion  techniques  can  be  designed  to  optimally  extract  signals 
from  the  image  and  improve  the  accuracy  of  CAD.  Several 
commercial  CAD  systems  already  obtained  FDA  approval 
for  use  with  FFDMs.  The  commercial  CAD  systems  gener¬ 
ally  reported  similar  performance  on  FFDMs  and  SFMs. 
However,  their  study  was  not  reported  in  peer-reviewed  jour¬ 
nals  so  that  the  data  set  and  algorithm  are  unknown.  Re¬ 
cently,  an  assessment  study34  to  compare  the  performance  of 
two  commercial  and  one  research  CAD  system  for  SFMs 
showed  that  their  mass  detection  sensitivities  ranged  from 
67%  to  72%  and  the  FP  rates  ranged  from  1.08  to  1.68  per 
four-view  examinations.  The  differences  in  sensitivities  were 
not  significant  whereas  the  differences  in  the  FP  rates  were 
significant,  depending  on  the  examinations  and  CAD  sys¬ 
tems  used.34 

We  have  developed  a  CAD  system  for  the  detection  of 
masses  on  SFMs  in  our  previous  studies.  '  '  We  are  de¬ 
veloping  a  mass  detection  system  for  mammograms  acquired 
directly  by  a  FFDM  system.  In  this  study,  we  adapted  our 
mass  detection  system  developed  for  SFMs  to  FFDMs  by 
optimizing  each  stage  and  retraining.  In  an  effort  to  develop 
a  CAD  system  that  is  less  dependent  on  the  FFDM  manufac¬ 
turer’s  proprietary  preprocessing  methods,  we  used  the  raw 
FFDM  as  input  and  developed  a  multiresolution  preprocess¬ 
ing  scheme  for  image  enhancement.  A  new  technique  was 
also  designed  for  prescreening  of  mass  candidates  on  the 
preprocessed  images. 

II.  MATERIALS  AND  METHOD 
A.  Data  sets 

The  mammograms  were  collected  from  patient  files  at  the 
Department  of  Radiology  with  Institutional  Review  Board 
approval.  Digital  mammograms  at  the  University  of  Michi¬ 
gan  are  acquired  with  a  GE  Senographe  2000D  FFDM  sys¬ 
tem.  The  GE  system  has  a  Csl  phosphor/ a :  Si  active  matrix 
flat  panel  digital  detector  with  a  pixel  size  of  100  /um 
X  100  i±m  and  14  bits  per  pixel.  In  this  study,  we  used  two 
data  sets:  a  mass  set  containing  FFDMs  with  malignant  or 
benign  masses  and  a  no-mass  set  containing  FFDMs  without 
masses.  The  no-mass  set  was  obtained  from  microcalcifica¬ 
tion  cases  collected  for  the  development  of  our  microcalcifi¬ 
cation  CAD  systems.  The  cases  were  included  as  normal, 
with  respect  to  masses,  only  if  they  were  verified  to  be  free 
of  masses  by  an  experienced  Mammography  Quality  Stan¬ 
dards  Act  (MQSA)  radiologist.  Our  mass  detection  system 


aims  at  application  to  screening  mammography  so  that  the 
mass  cases,  regardless  of  malignant  or  benign,  are  considered 
positive.  All  cases  had  two  mammographic  views,  the  cran- 
iocaudal  view  and  the  mediolateral  oblique  view  or  the  lat¬ 
eral  (LM  or  ML)  view.  The  mass  set  contained  110  cases 
with  a  total  of  220  images.  The  no-mass  set  contained  90 
cases  with  a  total  of  180  images.  The  mass  data  set  was  used 
to  estimate  the  detection  sensitivity  and  the  no-mass  data  set 
was  used  for  estimating  the  FP  rate.  There  were  a  total  of  110 
biopsy-proven  masses  in  the  mass  data  set.  Eighty-seven  of 
the  masses  were  benign  and  23  of  the  masses  were  malig¬ 
nant.  A  MQSA  radiologist  identified  the  locations  of  the 
masses,  measured  the  mass  sizes  as  the  longest  dimension 
seen  on  the  two-view  mammograms,  provided  descriptors  of 
the  mass  shapes  and  mass  margins,  and  also  provided  an 
estimate  of  the  breast  density  in  terms  of  BI-RADS  category. 
Figure  1  shows  the  information  of  our  data  set  which  in¬ 
cludes  the  distributions  of  mass  sizes,  mass  shapes,  mass 
margins,  and  breast  density. 

B.  Methods 

Our  CAD  system  consists  of  five  processing  steps:  (1) 
preprocessing  by  using  multiscale  enhancement,  (2)  pre¬ 
screening  of  mass  candidates,  (3)  identification  of  suspicious 
objects,  (4)  feature  extraction  and  analysis,  and  (5)  FP  reduc¬ 
tion  by  classification  of  normal  tissue  structures  and  masses. 
The  block  diagram  for  the  detection  scheme  is  shown  in  Fig. 
2.  These  steps  are  described  in  more  detail  in  the  following. 

We  randomly  separated  the  mass  data  set  into  two  inde¬ 
pendent,  equal  sized  subsets.  Each  subset  contained  55  cases 
with  110  images.  Cross  validation  was  used  for  training  and 
testing  the  algorithms.  The  training  included  selecting  the 
preprocessing  Laplacian  pyramid  reconstruction  weights,  ad¬ 
justing  the  filter  weights  for  prescreening  and  clustering,  de¬ 
termining  thresholds  for  rule-based  classification,  and  select¬ 
ing  morphological  and  texture  features  and  classifier 
weights.  Once  the  training  with  one  subset  was  completed, 
the  parameters  and  all  thresholds  were  fixed  for  testing  with 
the  other  subset.  The  training  and  test  subsets  were  switched 
and  the  training  process  was  repeated.  The  overall  detection 
performance  was  evaluated  by  combining  the  performances 
for  the  two  test  subsets.  The  trained  algorithms  with  the  fixed 
parameters  were  also  applied  to  the  no-mass  mammograms 
to  estimate  the  FP  rate  in  screening  mammograms. 

1.  Preprocessing 

FFDMs  are  generally  preprocessed  with  proprietary  meth¬ 
ods  by  the  manufacturer  of  the  FFDM  system  before  being 
displayed  to  readers.  The  image  preprocessing  method  used 
depends  on  the  manufacturer  of  the  FFDM  system.  To  de¬ 
velop  a  CAD  system  that  is  less  dependent  on  the  FFDM 
manufacturer's  proprietary  preprocessing  methods,  we  use 
the  raw  FFDM  as  input  to  our  CAD  system.  We  developed  a 
multiscale  preprocessing  scheme  for  image  enhancement. 

Multiscale  methods  have  been  used  for  contrast  enhance¬ 
ment  of  medical  images.  Since  a  multiscale  method  uses  the 
information  from  a  large  number  of  frequency  channels  ex- 
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Fig.  1.  The  information  of  our  mass 
data  set:  (a)  distribution  of  mass  sizes, 
(b)  distribution  of  mass  shapes,  (c) 
distribution  of  mass  margins,  C:  cir¬ 
cumscribed,  Ind:  indistinct,  M:  mi- 
crolobulated,  Ob:  obscured,  Sp:  spiqu- 
lated,  (d)  distribution  of  the  breast 
density  in  terms  of  BI-RADS  category 
estimated  by  a  MQSA  radiologist. 


tracted  from  the  image  adaptively,  it  is  more  flexible  and 
versatile  than  the  commonly  used  enhancement  methods, 
such  as  unsharp  masking,  which  uses  a  small  number  of 
frequency  channels.  Two  types  of  multiscale  methods  have 
been  used  as  the  preprocessing  methods  for  the  contrast  en¬ 
hancement  of  mammograms:  the  wavelet  method  and  the 
Laplacian  pyramid  method.  A  previous  study  has  shown 
that,  for  the  purpose  of  image  enhancement,  using  a  Laplac- 


Raw  FFDM 


T 


Multi-Scale  Enhancement 

y 

Prescreening 
(gradient  field  analysis) 

* 

Identification  of  Suspicious  Structures 

fa- 

(clustering-based  region  growing) 

* 

Feature  Analysis 

* 

FP  Classification 

(rule-based  classifier  and  LDA) 

Fig.  2.  Schematic  diagram  of  our  CAD  system  for  mass  detection  on 
FFDM.  The  system  is  developed  for  screening  mammography  so  that  all 
masses,  regardless  of  malignant  or  benign,  are  considered  positive.  The  FP 
classification  stage  includes  rule-based  classification,  a  morphological  LDA 
classifier,  and  a  texture  feature  LDA  classifier  for  differentiating  masses 
from  normal  breast  tissues. 


ian  pyramid  method  is  advantageous  compared  to  using  the 
fast  wavelet  transformation  which  introduces  visible 
artifacts.  In  this  project,  therefore,  we  chose  the  Laplacian 
pyramid  method  as  our  preprocessing  method. 

A  flowchart  of  our  preprocessing  method  is  shown  in  Fig. 
3.  In  brief,  the  mammogram  is  first  segmented  automatically 
into  the  background  and  the  breast  region.  Second,  a  loga¬ 
rithmic  transform  is  applied  to  the  breast  image.  The  Laplac¬ 
ian  pyramid  method  is  used  to  decompose  the  breast  image 


Fig.  3.  Schematic  diagram  for  the  image  preprocessing  stage  of  our  mass 
detection  system,  which  includes  breast  boundary  segmentation,  logarithmic 
image  transformation,  and  Laplacian  pyramid  multiscale  enhancement. 
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into  multiscales.  A  nonlinear  weight  function  based  on  the 
pixel  gray  level  from  each  of  the  low-pass  components  is 
designed  to  enhance  the  high-pass  components. 

Since  the  contrast  between  the  breast  and  the  background 
in  a  raw  FFDM  is  high,  a  two-step  algorithm  was  developed 

39 

for  the  segmentation  of  breast  region.  First,  Otsu’s  method 
is  used  to  calculate  a  threshold  and  binarize  the  original  im¬ 
age.  Second,  an  eight-connectivity  labeling  method  is  used  to 
identify  the  connected  regions  below  the  threshold  on  the 
binary  image.  The  region  with  the  largest  area  will  be  con¬ 
sidered  to  be  the  breast  region. 

Clinical  mammograms  are  usually  viewed  in  a  negative 
mode  of  the  raw  images.  In  order  to  process  an  image  with 
the  same  format  as  the  clinical  mammograms,  we  first  use  an 
inverted  logarithmic  function40  to  transform  the  raw  data.  A 
multiresolution  method  is  then  used  to  enhance  the  log- 
transformed  image.  The  inverted  logarithmic  function  for 
signal  transfer  can  be  expressed  as 

S'=ln(lf)  (1) 

where  X  is  the  gray  level  of  the  raw  data,  Amax  is  the  maxi¬ 
mum  of  the  14  bit  digital  gray  scale  number  (i.e.,  16  383). 
The  transformed  image  is  then  linearly  scaled  to  12  bit  pixel 
values. 

The  Laplacian  pyramid  decomposition  is  a  multiscale 
method  that  was  first  introduced  as  an  image  compression 
technique.  We  previously  evaluated  the  effect  of  Laplacian 
pyramid  data  compression  on  the  detection  of  microcalcifi¬ 
cations  on  digitized  mammograms.41  An  illustration  of  a  La¬ 
placian  decomposition  tree  is  shown  on  the  left-hand  side  of 
Fig.  4.  The  Laplacian  pyramid  is  a  sequence  of  error  images 
L() , T | , ...  ,Ln.  Each  is  the  difference  between  two  consecu¬ 
tive  levels  of  the  Gaussian  pyramid  G0,  G\ , . . . ,  G,„  where  G0 
is  the  original  image.  Each  subsequent  level  of  the  Gaussian 
pyramid  in  the  decomposition  tree  is  generated  by  convolu¬ 
tion  of  the  image  at  the  previous  level  with  a  5  X  5  kernel, 
w(m,n ),  that  has  weights  of  0.4  at  the  center,  0.25  at  the 
eight  nearest  neighbors  of  the  center,  and  0.05  at  the  16 
peripheral  pixels,  and  then  downsampled  by  a  factor  of  2,  as 
described  in  Eq.  (4).  The  decomposition  of  the  image  from 
level  k  to  level  k+ 1  can  be  expressed  mathematically  by 

Lk  =  Gk-  Expand  {Gk+l) ,  (2) 

where 

/  x  ,  x  ( i  —  m  j  —  n\ 

Expand(G*+1)  =  4  2  2  w(m,n)  •  GA+1  , 

m=-2  n=- 2  '  Z  Z  ' 

(3) 

2  2 

Gk(i,j)=  2  2  w{m,n)Gk_i(2i  +  m,2j  +  n).  (4) 

m——2  n——2 

The  original  image  can  be  recovered  by  following  the  Gauss¬ 
ian  reconstruction  tree  shown  on  the  right-hand  side  of  Fig.  4 
if  no  enhancement  is  applied  to  the  Laplacian  pyramid.  At  a 
given  level  of  the  Gaussian  reconstruction  tree,  the  image  is 


Laplacian  decomposition  tree  Gaussian  reconstruction  tree 


ian  reconstruction  tree  on  the  right-hand  side.  The  different  levels  of  the 
Gaussian  pyramid  images  are  denoted  by  G„  (z‘=0, . . .  ,n).  The  error  images 
at  different  levels  of  the  Laplacian  pyramid  are  denoted  by  L„  (i 
=0 The  primed  quantities  G'  and  L-  denoted  the  images  at  different 
levels  after  enhancement.  Z  denotes  the  summation  operation.  The  image  is 
downsampled  by  a  factor  of  2  when  it  goes  down  every  level  of  the  decom¬ 
position  tree,  and  upsampled  by  a  factor  2  when  it  moves  up  every  level  of 
the  reconstruction  tree. 

expanded  (convolved  and  upsampled),  as  shown  in  Eq.  (3), 
and  then  added  to  the  Laplacian  error  image  of  the  corre¬ 
sponding  level.  Details  of  the  decomposition  and  reconstruc- 

37 

tion  processes  can  be  found  in  the  literature. 

We  enhance  the  reconstructed  image  to  facilitate  mass 
detection.  The  image  at  each  level  of  the  Laplacian  pyramid 
that  corresponds  to  a  bandpass  image  is  mapped  by  a  non¬ 
linear  function.  In  this  study,  we  use  a  nonlinear  function  that 
incorporates  the  information  from  each  bandpass  image.  A 
Gaussian  pyramid  expansion  is  then  used  to  reconstruct  the 
image  from  the  low  pass  components  and  the  enhanced 
bandpass  components,  as  shown  in  Fig.  4.  The  reconstruction 
scheme  is  defined  by 

r(k)  =  a  ■  Expand(GA+i)  +  /3  •  (Expand(Gjt+1))/’  •  Lk,  (5) 

where  a,  /3,  and  p  are  constant  values  in  the  range  of  0.2-2. 0 
experimentally  chosen  for  each  frequency  level. 

Figures  5(a)  and  5(b)  show  an  example  of  a  GE  raw  im¬ 
age  and  its  processed  image  provided  by  the  GE  FFDM  sys¬ 
tem.  The  histograms  of  the  raw  image  and  the  processed 
image  are  shown  next  to  the  corresponding  images.  An  ex¬ 
ample  of  the  processed  image  using  our  multiresolution  en¬ 
hancement  method  and  the  corresponding  histogram  are 
shown  in  Fig.  5(c). 

2.  Prescreening  and  segmentation 
of  suspicious  objects 

In  our  previous  CAD  system  developed  for  digitized 
SFMs,  an  adaptive  density-weighted  contrast  enhancement 
(DWCE)  filter35  was  developed  for  prescreening.  Although 
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Fig.  5.  An  example  of  (a)  GE  raw  image,  (b)  GE  processed  image,  and  (c) 
our  processed  image  by  using  the  Laplacian  pyramid  multiscale  method. 
The  gray  level  histogram  of  each  image  is  also  shown.  The  GE  raw  image 
has  14  bit  gray  levels  but  the  histogram  only  plotted  the  lower  12  bits  be¬ 
cause  very  few  pixels  had  gray  levels  higher  than  4095. 


the  smoothed  image.  At  each  pixel  c(i)  within  the  breast, 
concentric  annular  regions  centered  at  c(i)  with  an  average 
radius,  R(k),  of  k  pixels  from  c(i)  and  a  radial  width  of 
4  pixels  are  defined  within  a  circular  region  of  about  12  mm 
in  radius.  The  gradient  vector  at  each  pixel  p{j)  within  an 
annular  region  is  computed  and  the  gradient  direction  is  ob¬ 
tained  by  projecting  the  gradient  vector  to  the  radial  direction 
vector  from  c(i)  to  p(j).  The  average  gradient  direction  over 
an  annular  region  at  the  average  radius  R(k)  is  calculated  as 
the  mean  of  the  gradient  directions  over  pixels  on  three  ad¬ 
jacent  annular  regions  R(k- 1),  R(k),  and  R(k+  1 ).  Finally, 
the  gradient  field  convergence  at  c{i)  was  determined  as  the 
maximum  of  the  average  gradient  directions  among  all  an¬ 
nular  regions.  A  region  of  interest  (ROI)  of  256 
X256  pixels  in  the  100  /rmX  100  /im  images  is  identified 
with  its  center  placed  at  each  location  of  high  gradient  con¬ 
vergence.  The  object  in  each  ROI  is  segmented  by  a  region 
growing  method44  in  which  the  location  of  high  gradient 
convergence  is  used  as  the  starting  point.  After  region  grow¬ 
ing,  all  connected  pixels  constituting  the  object  are  labeled. 
Finally,  the  gradient  convergence  at  the  center  location  of  the 
ROI  is  recalculated  within  the  segmented  object.  Objects 
whose  new  gradient  convergence  is  lower  than  80%  of  the 
original  value  are  rejected. 

After  prescreening,  the  suspicious  objects  are  identified 
by  using  a  two-stage  segmentation  method.  First,  the 
background-corrected  ROI  was  weighted  by  a  Gaussian 
function  with  cr=256  pixels.  Then,  a  k- means  clustering  us¬ 
ing  the  pixel  values  in  a  background-corrected  image  and  a 
Sobel  filtered  image  as  features  is  used  to  find  the  object. 
Figures  6(a)  and  6(b)  show  the  initial  detection  locations  and 
the  grown  objects,  respectively,  obtained  by  prescreening  the 
mammogram  shown  in  Fig.  5(c). 


the  DWCE  filter  using  the  gray  level  information  can  iden¬ 
tify  the  suspicious  locations  of  masses  on  mammograms  with 
high  sensitivity,  the  prescreening  objects  often  include  a 
large  number  of  enhanced  normal  breast  structures. 

In  this  study,  we  investigated  the  use  of  a  new  method  that 
combines  gradient  field  information  and  gray  level  informa¬ 
tion  to  detect  mass  candidates  on  FFDMs.  Gradient  field  in¬ 
formation  is  commonly  used  in  computer  vision  or  other 
fields  to  extract  objects  or  intensity  field  distributions.  Ko- 
batake  et  al.4~  designed  a  filter,  referred  to  as  an  iris  filter,  to 
calculate  the  convergence  of  gradient  index  around  each 
pixel  on  SFMs  which  provided  shape  information  for  detec¬ 
tion  of  masses.  An  extension  of  the  iris  filter,  referred  to  as 
an  adaptive  ring  filter,  was  developed  by  Wei  et  al.A  '  for 
detection  of  lung  nodules  on  chest  x-ray  images.  In  this 
study,  we  have  developed  a  two-stage  gradient  field  analysis 
method  which  uses  not  only  the  shape  information  of  masses 
on  mammograms  but  also  incorporates  the  gray  level  infor¬ 
mation  of  the  local  object  segmented  by  a  region  growing 
technique  in  the  second  stage  to  refine  the  gradient  field 
analysis. 

To  reduce  noise  in  the  gradient  calculation,  the  image  is 
smoothed  with  a  4X4  box  filter  and  subsampled  to 
400  /im  X400  /im.  The  gradient  field  analysis  is  applied  to 


3.  Feature  extraction  and  FP  reduction 

FP  classification  in  our  mass  detection  system  is  accom¬ 
plished  by  a  three-stage  classification  scheme.36'44  For  each 
suspicious  object,  eleven  morphological  features  are  ex¬ 
tracted.  Rule-based  classification  and  a  linear  discriminant 
analysis  (LDA)  classifier  using  all  1 1  morphological  features 
as  input  predictor  variables  are  trained  to  remove  the  de¬ 
tected  structures  that  are  substantially  different  from  breast 
masses.  The  training  data  set  alone  was  used  for  training  the 
classification  rules  and  the  weights  of  the  LDA  classifier. 
After  morphological  classification,  global  and  local  multi¬ 
resolution  texture  analyses45  are  performed  in  each  remain¬ 
ing  ROI  by  using  the  spatial  gray  level  dependence  (SGLD) 
matrix.  Briefly,  the  wavelet  transform  is  employed  to  decom¬ 
pose  an  ROI  into  three  levels  for  global  texture  analysis. 
Thirteen  types  of  texture  features44'46  are  extracted  from  each 
ROI.  Each  feature  is  calculated  at  14  pixel  distances  and  2 
angular  directions.  A  total  of  364  features  (13  texture 
measures  X  14  distances  X  2  directions)  is  extracted  from 
global  texture  analysis.  Local  texture  features  are  extracted 
from  the  local  region  containing  the  detected  object  (object 
region)  and  the  peripheral  regions  within  each  ROI.  A  total 
of  208  features  (104  features  from  the  object  region  and  104 
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Fig.  6.  An  example  demonstrating  the  processing  steps  with  our  CAD  sys¬ 
tem:  (a)  object  locations  identified  in  prescreening,  (b)  identified  suspicious 
objects,  (c)  detected  objects  after  FP  reduction,  and  (d)  image  superimposed 
with  ROIs  identifying  the  detected  objects.  The  true  mass  is  indicated  by  an 
arrow. 


features  from  the  peripheral  regions)  are  extracted.  The  third- 
stage  FP  reduction  using  the  texture  features  is  described 
next. 


4.  Texture  classification  of  masses 
and  normal  tissue 

In  order  to  obtain  the  best  texture  feature  subset  and  re¬ 
duce  the  dimensionality  of  the  feature  space  to  design  an 
effective  classifier,  feature  selection  with  stepwise  LDA  was 
applied.  At  each  step  one  feature  was  entered  or  removed 
from  the  feature  pool  by  analyzing  its  effect  on  the  selection 
criterion,  which  was  chosen  to  be  the  Wilks’  lambda  in  this 
study.  The  optimization  procedure  used  a  threshold  Fin  for 
feature  entry,  a  threshold  Fout  for  feature  removal,  and  a  tol¬ 
erance  threshold  T  for  excluding  features  that  had  high  cor¬ 
relation  with  the  features  already  in  the  selected  pool.  Since 
the  appropriate  values  of  Fin,  Fout,  and  T  were  unknown,  we 
examined  a  range  of  Fin,  Fout,  and  T  values  using  an  auto¬ 
mated  simplex  optimization  method.  For  a  given  combina¬ 
tion  of  Fin,  Fout,  and  T  values,  the  algorithm  used  a  leave- 
one-case-out  resampling  method  within  the  training  subset  to 
select  features  and  estimate  the  weights  for  the  LDA  classi¬ 
fier.  To  evaluate  the  classifier  performance,  the  test  discrimi¬ 
nant  scores  from  the  left-out  cases  were  analyzed  using  re¬ 


ceiver  operating  characteristic  (ROC)  methodology.47  The 
discriminant  scores  of  the  mass  and  normal  tissue  were  used 
as  the  decision  variable  in  the  LABROC  program,  which  fits  a 
binormal  ROC  curve  based  on  maximum  likelihood  estima¬ 
tion.  The  accuracy  for  classification  of  mass  and  normal  tis¬ 
sue  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  The 
test  A,  for  the  left-out  cases  in  the  leave-one-out  resampling 
within  the  training  subset  was  used  as  a  figure  of  merit  to 
guide  the  simplex  algorithm  to  search  for  the  best  set  of  Fm, 
Fout,  and  T  values  within  the  parameter  space.  In  this  ap¬ 
proach,  feature  selection  was  performed  without  the  left-out 
case  so  that  the  test  performance  would  be  less  optimistically 
biased.  However,  the  selected  feature  set  in  each  leave-one- 
case-out  cycle  could  be  slightly  different  because  every  cycle 
had  one  training  case  different  from  the  other  cycles.  In  order 
to  obtain  a  single  trained  classifier  to  apply  to  the  test  subset, 
a  final  stepwise  feature  selection  was  performed  with  the 
entire  training  subset  and  a  set  of  Fin,  Fout,  and  T  thresholds 
chosen  from  the  output  of  simplex  training  process.  This  set 
of  Fin,  Fout,  and  T  thresholds  was  chosen  based  not  only  on 
the  test  A,  values,  which  were  generated  when  the  simplex 
procedure  was  searching  through  the  parameter  space,  but 
also  on  the  average  number  of  features  selected.  The  appro¬ 
priate  thresholds  were  chosen  as  a  balance  between  keeping 
the  number  of  selected  features  small  and  a  relatively  high 
classification  accuracy  by  LDA.  The  chosen  thresholds  were 
then  applied  to  the  entire  training  subset  to  obtain  the  final 
set  of  features  using  stepwise  feature  selection  and  estimate 
the  weights  of  the  LDA.  The  LDA  classifier  with  the  selected 
feature  set  was  then  fixed  and  applied  to  the  test  subset.  The 
test  subset  was  independent  of  the  training  subset  as  de¬ 
scribed  in  Sec.  II B  2  and  was  not  used  in  the  above- 
described  leave-one-case-out  classifier  training  process. 

5.  Evaluation  methods 

The  detected  individual  objects  were  compared  with  the 
“truth”  ROI  marked  by  an  experienced  radiologist.  A  de¬ 
tected  object  was  scored  as  true  positive  (TP)  if  the  overlap 
between  the  bounding  box  of  the  detected  object  and  the 
truth  ROI  was  over  25%.  Otherwise,  it  would  be  scored  as 
FP.  The  25%  threshold  was  selected  as  described  in  our  pre¬ 
vious  study.36  The  detection  performance  of  the  CAD  system 
was  assessed  by  free  response  ROC  (FROC)  analysis.  FROC 
curves  were  presented  on  a  per-mammogram  and  a  per-case 
basis.  For  mammogram-based  FROC  analysis,  the  mass  on 
each  mammogram  was  considered  an  independent  true  ob¬ 
ject;  the  sensitivity  was  thus  calculated  relative  to  220 
masses.  For  case-based  FROC  analysis,  the  same  mass  im¬ 
aged  on  the  two-view  mammograms  was  considered  to  be 
one  true  object  and  detection  of  either  or  both  masses  on  the 
two  views  was  considered  to  be  a  TP  detection;  the  sensitiv¬ 
ity  was  thus  calculated  relative  to  110  masses.  Figure  6(c) 
shows  an  example  of  the  final  detected  objects  and  Fig.  6(d) 
shows  the  locations  of  these  objects  superimposed  on  the 
mammogram. 

To  evaluate  the  effect  of  the  preprocessing  methods  on 
mass  detection,  we  also  trained  a  CAD  system  using  the  GE 
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False  Positive  Fraction 

Fig.  7.  The  test  ROC  curves  from  the  two  independent  mass  subsets.  The 
LDA  classifiers  using  text  features  achieved  an  A .  value  of  0.89+0.02  for 
test  subset  1  and  0.85+0.02  for  test  subset  2  in  the  classification  of  mass  and 
normal  breast  tissues. 

processed  image  as  input.  This  CAD  system  used  the  same 
methods  as  those  described  earlier  for  the  raw  images  except 
that  the  Laplacian  pyramid  preprocessing  step  was  not  ap¬ 
plied  to  the  GE  processed  image,  and  that  the  prescreening 
and  feature  classifiers  were  retrained  specifically  for  the  GE 
processed  images  to  obtain  the  best  performance.  The  train¬ 
ing  and  test  subsets  contained  the  same  corresponding  cases 
as  for  the  raw  image  subsets.  The  training  and  testing  were 
performed  using  the  above-described  cross  validation 
method.  The  performance  of  the  CAD  system  using  the  GE 
processed  images  was  quantified  by  the  average  test  FROC 
curve  and  compared  with  that  using  the  raw  images. 

III.  RESULTS 

With  raw  images  as  input  and  Laplacian  pyramid  en¬ 
hancement,  our  CAD  system  using  the  two-stage  gradient 
field  analysis  detected  92.7%  (204/220)  of  the  masses  with 
an  average  of  18.9  (4152/220)  objects/image  at  the  pre¬ 
screening  stage,  compared  with  an  average  of  23.8  objects/ 
image  at  the  same  sensitivity  by  using  gradient  field  infor¬ 
mation  alone.  After  FP  reduction  using  the  rule-based  and 
linear  classifier  based  on  morphological  features,  there  were 
a  total  of  3412  mass  candidates  (15.5  objects/image)  at  a 
sensitivity  of  90.5%  (199/220). 

The  texture-based  LDA  classifier  for  FP  reduction  was 
designed  with  stepwise  feature  selection  and  simplex  optimi¬ 
zation.  The  most  effective  subset  of  features  from  the  avail¬ 
able  feature  pool  was  selected  for  each  of  the  training  subsets 
during  the  training  procedure.  Twenty  (11  global  and  9  local) 
and  19  (12  global  and  7  local)  texture  features  were  selected 
from  the  two  independent  training  subsets,  respectively.  The 
test  ROC  curves  are  shown  in  Fig.  7.  The  training  Az  values 
of  the  LDA  classifier  on  the  two  training  subsets  were 
0.87  +  0.02  and  0.88  +  0.01,  respectively.  The  classifiers 
achieved  A.  values  of  0.89  +  0.02  and  0.85  +  0.02  on  the  in¬ 
dependent  test  subsets,  respectively.  Figure  8  shows  the 
FROC  curves  for  the  two  test  subsets  after  FP  reduction  with 
the  corresponding  trained  LDA  classifiers.  An  average  FROC 
curve  was  derived  from  these  two  FROC  curves  by  averag- 


Fig.  8.  The  test  FROC  curves  from  the  two  independent  mass  subsets  for 
the  CAD  system  using  the  raw  images  as  input  and  processed  with  the 
Laplacian  pyramid  method.  The  FP  rate  was  estimated  from  the  mammo¬ 
grams  with  masses,  (a)  Image-based  FROC  curves,  (b)  case-based  FROC 
curves. 


ing  the  FP/images  at  the  corresponding  sensitivities.  This 
average  test  FROC  curve  is  plotted  in  Fig.  9  for  comparison 
with  the  other  FROC  curves,  described  next. 

In  addition  to  using  the  mass  data  set  containing  110  cases 
for  the  cross  validation  training  and  testing,  we  used  a  no¬ 
mass  data  set  containing  90  cases  with  180  images  to  evalu¬ 
ate  the  FP  detection  rate  in  normal  cases.  Since  two  sets  of 
trained  parameters  were  acquired  as  a  result  of  the  cross 
validation  training,  we  applied  the  two  trained  CAD  systems 
separately  to  the  no-mass  data  set  for  FP  detection.  The  num¬ 
ber  of  FP  marks  produced  by  the  algorithm  was  determined 
by  counting  the  detected  objects  on  these  normal  cases  only. 
The  mass  detection  sensitivity  was  determined  by  counting 
only  the  abnormal  objects  on  each  of  the  test  mass  subsets. 
The  combination  of  the  sensitivity  from  each  of  the  test  mass 
subsets  and  the  FP  rate  from  the  normal  data  set  at  the  cor¬ 
responding  detection  thresholds  resulted  in  a  test  FROC 
curve.  The  two  test  FROC  curves  were  then  averaged,  as 
described  earlier,  to  obtain  an  overall  FROC  curve  quantify¬ 
ing  the  test  performance  of  the  CAD  system.  Figures  9(a) 
and  9(b)  show  the  comparison  of  the  average  FROC  curves 
with  the  FP  rates  estimated  from  the  two  data  sets.  The  test 
FROC  curve  with  the  FP  rate  estimated  from  the  no-mass 
data  set  showed  a  case-based  detection  sensitivity  of  70%, 
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(b)  Number  of  False  Positives  per  Image 

Fig.  9.  Comparison  of  the  average  test  FROC  curves  obtained  from:  (1)  the 
CAD  system  using  raw  images  as  input,  with  the  FP  rate  estimated  from  the 
mammograms  with  masses,  (2)  the  CAD  system  using  raw  images  as  input, 
with  the  FP  rate  estimated  from  the  normal  mammograms  without  masses, 
and  (3)  the  CAD  system  using  GE  processed  images  as  input,  with  the  FP 
rate  estimated  from  the  GE  processed  mammograms  with  masses,  (a) 
Image-based  FROC  curves,  (b)  case-based  FROC  curves. 


(b)  Number  of  False  Positives  per  Image 


Fig.  10.  Comparison  of  the  average  test  FROC  curves  for  the  malignant  and 
benign  mass  sets.  The  CAD  system  using  raw  images  as  input  was  used  and 
the  FP  rate  was  estimated  from  the  mammograms  without  masses,  (a) 
Image-based  FROC  curves,  (b)  case-based  FROC  curves. 


90%  at  0.9,  1.6,  and  3.1  FP  marks/image,  respectively,  com¬ 
pared  with  0.7,  1.1,  and  1.8  FP  marks/image  on  the  CAD 
system  using  raw  images  as  input. 


80%,  and  90%  at  0.85,  1.31,  and  2.14  FP  marks/image, 
which  are  slightly  higher  than  the  FP  rates  of  0.7,  1.1,  and 
1.8  marks/image,  respectively,  estimated  from  the  mass  data 
set.  Since  our  mass  detection  algorithm  limits  the  maximum 
number  of  output  marks  to  be  3  at  the  final  stage,  the  FP 
marker  rates  will  be  slightly  higher  if  the  detection  is  per¬ 
formed  in  no-mass  images.  However,  many  images  do  not 
reach  the  maximum  of  3  marks  so  that  the  difference  in  the 
FP  marker  rate  between  the  mass  and  no-mass  set  is  less  than 
one.  We  also  analyzed  the  detection  accuracy  of  the  system 
for  malignant  and  benign  masses  separately.  Figures  10(a) 
and  10(b)  show  the  average  FROC  curves  for  detection  of 
malignant  and  benign  masses. 

The  average  test  FROC  curves  of  the  CAD  system  using 
the  GE  processed  images  as  input  were  compared  to  those  of 
the  CAD  system  using  raw  images  as  input  and  Laplacian 
pyramid  multiscale  preprocessing  as  shown  in  Fig.  9.  The 
FROC  curves  were  plotted  as  the  detection  sensitivity  as  a 
function  of  the  number  of  FP  marks  per  image  on  the  mass 
data  set.  The  CAD  system  using  the  GE  processed  images  as 
input  achieved  a  case-based  sensitivity  of  70%,  80%,  and 


IV.  DISCUSSION 

Several  FFDM  systems  have  been  approved  for  clinical 
applications.  It  is  important  to  develop  a  CAD  system  that 
can  easily  be  adapted  to  images  acquired  by  FFDM  systems 
from  different  manufacturers.  In  this  study,  we  are  develop¬ 
ing  a  CAD  system  that  uses  the  raw  FFDMs  as  the  input. 
Since  digital  detectors  generally  have  a  linear  response  to 
x-ray  exposure,  the  raw  pixel  values  are  a  linear  function  of 
the  absorbed  x-ray  energy  in  the  detector.  The  signal  range 
between  different  digital  detectors  can  therefore  be  normal¬ 
ized  linearly  with  respect  to  each  other.  Although  the  spatial 
resolution  and  noise  properties  of  the  images  from  different 
detectors  are  still  different,  the  use  of  raw  images  already 
reduces  one  of  the  major  differences  between  mammograms 
from  different  FFDM  systems.  For  preprocessing  of  the  raw 
images,  we  developed  a  multiresolution  enhancement 
method.  An  example  of  a  typical  mammogram  processed  by 
the  GE  method  and  our  method  is  compared  in  Fig.  5.  As 
seen  from  this  example,  the  enhancement  of  mammographic 
structures  was  stronger  for  our  processed  image  than  for  the 
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Table  I.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the  CAD 
system  using  the  FFDM  raw  images  as  input  and  processed  with  our  Laplacian  pyramid  method  and  that  of  the 
CAD  system  using  GE  processed  images  as  input.  The  FROC  curves  with  the  FP  rates  obtained  from  the 
no-mass  data  set  (Fig.  9)  were  compared. 


Ai  (AFROC) 

FOM  (JAFROC) 

Test 

Test 

P 

Test 

Test 

P 

subset  1 

subset  2 

values 

subset  1 

subset  2 

values 

Raw+LP  processed 

0.44 

0.39 

0.012 

0.46 

0.41 

0.006 

GE  processed 

0.37 

0.31 

0.0009 

0.39 

0.34 

0.012 

GE  processed  image.  From  a  comparison  of  their  histograms, 
it  was  found  that  the  two  histograms  are  very  similar  except 
for  the  average  gray  level. 

For  the  evaluation  of  the  effect  of  the  preprocessing  meth¬ 
ods  on  computerized  mass  detection,  we  observed  that  our 
Laplacian  pyramid  preprocessing  method  provided  higher 
detection  accuracy  than  the  GE  processing  method.  As 
shown  in  Fig.  5,  the  Laplacian  pyramid  preprocessing 
method  applies  a  stronger  edge  enhancement  to  the  image 
than  the  GE  method.  Our  preprocessing  method  aims  at  en¬ 
hancing  the  image  structures  for  computer  vision  whereas 
the  GE  processing  method  was  designed  to  enhance  the  im¬ 
age  for  human  visual  interpretation.  The  stronger  enhance¬ 
ment  used  for  preprocessing  the  raw  images  appeared  to  im¬ 
prove  the  accuracy  of  the  computer  in  detecting  the  masses. 

Currently,  there  is  no  established  statistical  analysis 
method  for  testing  the  significance  of  the  difference  between 
two  FROC  curves  generated  by  a  CAD  system.  Chakraborty 
et  al.  proposed  using  an  alternative  free -response  ROC 
(AFROC)  method49  to  transform  the  FROC  data  to  AFROC 
data,  to  which  the  curve  fitting  software  and  statistical  sig¬ 
nificance  tests  for  ROC  analysis  can  then  be  applied  and 
demonstrated  its  application  to  human  observer  performance 
rating  data.  In  the  AFROC  method,  false-positive  images 
(FPIs)  instead  of  FPs  per  image  are  counted.  The  confidence 
rating  of  a  FPI  is  determined  by  the  highest  confidence  FP 
decision  on  the  image  regardless  of  how  many  lower  confi¬ 
dence  FP  decisions  are  made  on  the  same  image.  We  applied 
the  AFROC  method  to  evaluate  the  differences  in  pairs  of 
our  FROC  curves  that  used  the  no-mass  set  for  estimation  of 
the  FP  rates.  The  ROCKIT  software  developed  by  Metz  et  al ,47 
was  used  to  analyze  the  AFROC  data.  The  comparison  of  A , 
and  p  values  is  summarized  in  Table  I.  The  area  under  the 
fitted  AFROC  curve  (A ,)  was  0.44  and  0.39,  respectively,  on 
mass  test  subsets  1  and  2  for  the  CAD  system  using  raw 
images  as  input  and  processed  with  our  Laplacian  pyramid 
method,  and  0.37  and  0.31,  respectively,  on  the  same  subsets 
for  the  CAD  system  using  GE  processed  images  as  input. 
The  difference  between  the  fitted  AFROC  curve  for  our  pro¬ 
cessed  images  and  that  for  the  GE  processed  images  was 
statistically  significant  (p  <  0.05 )  for  both  test  subsets.  How¬ 
ever,  all  four  fitted  AFROC  curves  deviated  systematically 
from  the  AFROC  data  (see  two  examples  plotted  in  Fig.  11 
for  the  test  subset  1).  It  is  uncertain  whether  the  AFROC 


method  is  applicable  to  our  FROC  data  and  thus  whether  the 
statistical  significance  testing  is  valid. 

More  recently,  Chakraborty  et  al.5i)  proposed  a  J AFROC 
method  and  provided  software  to  estimate  the  statistical  sig¬ 
nificance  of  the  difference  between  two  FROC  curves.  We 
also  applied  the  JAFROC  analysis  to  the  two  pairs  of  FROC 
curves.  The  figure-of-merit  (FOM)  from  the  output  of  the 
JAFROC  software  was  0.46  and  0.41,  respectively,  on  mass 
test  subsets  1  and  2  for  the  CAD  system  using  raw  images  as 
input  and  processed  with  our  Laplacian  pyramid  method,  and 
0.39  and  0.34,  respectively,  on  the  same  subsets  for  the  CAD 
system  using  GE  processed  images  as  input.  The  difference 
between  the  FOM  for  our  processed  images  and  that  for  the 
GE  processed  images  was  again  statistically  significant  ( p 
<0.05).  The  FOM  values  were  about  0.02  higher  than  the 
corresponding  A [  values.  The  JAFROC  software  did  not  pro¬ 
vide  a  fitted  curve  or  a  goodness-of-fit  indicator  in  the  output 
so  that  it  is  not  known  whether  this  model  fits  our  FROC 
data  better  than  the  AFRPC  method.  Although  both  methods 
indicate  that  the  improvement  in  the  FROC  performance  us¬ 
ing  our  Laplacian  pyramid  processed  images  is  statistically 


Probability  of  at  least  one  False 
Positive  per  Image 

Fig.  11.  Comparison  of  alternative  free-response  receiver  operating  charac¬ 
teristic  (AFROC)  curves.  The  raw  curves  were  transformed  from  the  FROC 
curves  of  mass  detection  on  test  subset  1  using  either  the  raw  images  as 
input  and  processed  with  the  Laplacian  pyramid  method  (LP)  or  the  GE 
processed  images  as  input.  The  FP  rate  was  estimated  from  the  mammo¬ 
grams  without  masses.  The  fitted  AFROC  curves  were  obtained  by  applying 
the  rockit  program  to  the  transformed  AFROC  data. 
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significant,  further  investigations  are  needed  to  study 
whether  these  models  are  valid  for  analyzing  the  FROC  per¬ 
formance  of  CAD  systems. 

The  prescreening  technique  is  an  important  task  in  a  CAD 
system.  A  number  of  researchers  have  developed  methods  for 
detection  of  suspicious  masses  on  SFMs  and  CRs.  The  pre¬ 
vious  methods  produced  between  10  to  30  FPs/image  for  a 
mass  detection  sensitivity  of  approximately  90%.  However, 
it  is  difficult  to  compare  the  effectiveness  of  the  different 
methods  because  of  the  differences  in  the  image  recording 
systems  and  in  the  data  sets.  In  this  study,  we  developed  a 
new  method  that  combines  gradient  field  information,  which 
was  originally  developed  for  the  detection  of  lung  nodules  on 
chest  x-ray  images,43  and  gray  level  information44  for  pre¬ 
screening  mass  candidates  on  the  FFDMs.  The  new  method 
produced  18.9  objects/image  at  93%  sensitivity  in  the  pre¬ 
screening  step,  compared  with  an  average  of  23.8  objects/ 
image  at  the  same  sensitivity  by  using  gradient  field  infor¬ 
mation  alone. 

The  texture  features  in  this  study  were  extracted  by  using 
the  SGLD  matrix.  A  total  of  572  features  were  included  in 
our  initial  feature  pool.  These  features  were  also  used  by  our 
CAD  system  previously  developed  for  SFMs.  An  average 
number  of  19.5  features  were  selected  by  using  a  stepwise 
feature  selection  method.  The  A,  values  for  the  LDA  classi¬ 
fiers  were  0.87+0.02  and  0.88  +  0.01  on  the  two  training  sub¬ 
sets,  and  0.89+0.02  and  0.85  +  0.02  on  the  test  subsets,  re¬ 
spectively.  The  slightly  higher  test  Az  from  the  first  test 
subset  than  the  A.  from  its  training  subset  may  indicate  that 
some  relatively  easy  cases  were  assigned,  by  chance,  to  that 
test  set  during  random  partitioning.  We  also  investigated  if 
other  features  could  improve  the  performance  of  our  CAD 
system.  The  different  feature  spaces  that  we  examined  in¬ 
cluded  features  extracted  from  principal  component  analysis 
applied  to  the  ROI  image,  run  length  statistics  texture  fea¬ 
tures  extracted  from  the  ROI  images,  and  combination  of  one 
or  both  of  these  feature  spaces  with  the  SGLD  feature  space. 
However,  the  test  results  showed  that  a  LDA  classifier  de¬ 
signed  in  the  SGLD  feature  space  alone  provided  the  best 
performance.  Although  this  was  found  to  be  true  for  both  our 
CAD  mass  detection  system  for  SFMs  developed  previously 
and  the  current  system  for  FFDMs,  it  is  still  difficult  to  con¬ 
clude  that  the  SGLD  features  are  the  best  feature  set  for 
classification  between  breast  masses  and  normal  tissues.  One 
major  concern  of  the  SGLD  feature  space  is  that  the  depen¬ 
dence  of  the  feature  values  on  the  pixel  pair  distance  and 
angular  direction  leads  to  a  feature  pool  with  a  large  number 
of  features.  Some  features  in  such  a  large  feature  space  may 
provide  good  performance  in  classification  of  masses  and 
normal  structures  by  chance.  We  attempted  to  alleviate  this 
problem  by  using  an  independent  test  set  to  evaluate  the 
classifier  performance.  However,  since  we  chose  the  overall 
system  parameters  with  the  knowledge  of  the  performance 
for  the  test  sets,  the  evaluation  would  still  amount  to  valida¬ 
tion  rather  than  true  testing.  We  have  verified  that  our  CAD 
system  for  SFMs  can  achieve  reasonable  performance  in  a 
true  independent  data  set36  and  a  prospective  pilot  clinical 


trial.16  The  performance  of  the  current  CAD  system  for 
FFDMs  will  have  to  be  evaluated  similarly  when  indepen¬ 
dent  data  sets  become  available. 

The  detection  performance  of  a  CAD  system  for  malig¬ 
nant  masses  is  more  important  than  its  performance  for  all 
masses.  Figures  10(a)  and  10(b)  indicate  that  the  sensitivity 
of  the  system  is  higher  for  malignant  masses  than  for  benign 
masses.  This  is  consistent  with  our  observation  in  previous 
studies  of  our  CAD  system  for  digitized  SFMs.36  However, 
since  our  current  data  set  contained  only  23  malignant  cases, 
there  will  be  large  statistical  uncertainty  in  the  evaluation  of 
sensitivity  in  this  subset.  A  larger  data  set  is  being  collected 
for  comparing  the  detection  performances  of  the  CAD  sys¬ 
tem  between  malignant  and  benign  masses  and  also  for  the 
purpose  of  classifying  malignant  and  benign  masses.  Further¬ 
more,  CAD  algorithms  developed  for  SFMs  have  been 
proven  to  be  useful  as  a  second  opinion  to  assist  radiologists 
in  mammographic  interpretation.  Because  of  the  higher  SNR 
and  linear  response  of  digital  detectors,  there  is  also  a  poten¬ 
tial  that  FFDMs  can  improve  the  sensitivity  of  breast  cancer 
detection,  especially  in  dense  breasts.  Several  studies  have 
been  or  are  being  conducted  to  compare  FFDM  with  SFM  in 
screening  cohorts.  It  is  also  important  to  compare  the  perfor¬ 
mance  of  CAD  systems  between  FFDMs  and  SFMs.  A  study 
is  under  way  to  compare  the  performance  of  the  two  systems 

on  pairs  of  FFDM  and  SFM  obtained  from  the  same 

.•  .  51 

patients. 

V.  CONCLUSION 

Several  FFDM  systems  have  been  approved  for  clinical 
applications.  It  is  important  to  develop  CAD  systems  for 
breast  cancer  detection  in  FFDM.  In  this  work,  we  developed 
a  CAD  system  that  uses  the  raw  FFDMs  as  the  input.  A 
multiresolution  Laplacian  pyramid  enhancement  method  was 
devised  to  preprocess  the  raw  FFDMs.  A  new  prescreening 
method  that  combined  gradient  field  analysis  with  gray  level 
information  was  developed  to  identify  mass  candidates. 
Rule-based  and  LDA  classifiers  in  a  feature  space  which  con¬ 
sisted  of  morphological  features  and  SGLD  texture  features 
were  designed  to  differentiate  masses  from  normal  tissues.  It 
was  found  that  our  CAD  system  achieved  a  case-based  sen¬ 
sitivity  of  70%,  80%,  and  90%  with  an  estimate  of  0.85, 
1.31,  and  2.14  FP  marks/image,  respectively,  on  normal 
cases.  The  results  indicate  that  our  mass  detection  CAD 
scheme  can  be  useful  for  detecting  masses  on  FFDMs.  Stud¬ 
ies  are  under  way  to  further  optimize  the  processing  param¬ 
eters,  the  feature  extraction,  and  the  classifiers  for  FP  reduc¬ 
tion.  Comparison  of  mass  detection  performance  of  our  CAD 
system  for  FFDMs  and  that  for  SFMs  is  also  in  progress. 
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In  this  study,  our  purpose  was  to  improve  the  performance  of  our  mass  detection  system  by  using 
a  new  dual  system  approach  which  combines  a  computer-added  detection  (CAD)  system  optimized 
with  “average”  masses  with  another  CAD  system  optimized  with  “subtle”  masses.  The  two  single 
CAD  systems  have  similar  image  processing  steps,  which  include  prescreening,  object  segmenta¬ 
tion,  morphological  and  texture  feature  extraction,  and  false  positive  (FP)  reduction  by  rule-based 
and  linear  discriminant  analysis  (LDA)  classifiers.  A  feed-forward  backpropagation  artificial  neural 
network  was  trained  to  merge  the  scores  from  the  LDA  classifiers  in  the  two  single  CAD  systems 
and  differentiate  true  masses  from  normal  tissue.  For  an  unknown  test  mammogram,  the  two  single 
CAD  systems  are  applied  to  the  image  in  parallel  to  detect  suspicious  objects.  A  total  of  three  data 
sets  were  used  for  training  and  testing  the  systems.  The  first  data  set  of  230  current  mammograms, 
referred  to  as  the  average  mass  set,  was  collected  from  115  patients.  We  also  collected  264  mam¬ 
mograms,  referred  to  as  the  subtle  mass  set,  which  were  one  to  two  years  prior  to  the  current  exam 
from  these  patients.  Both  the  average  and  the  subtle  mass  sets  were  partitioned  into  two  indepen¬ 
dent  data  sets  in  a  cross  validation  training  and  testing  scheme.  A  third  data  set  containing  65  cases 
with  260  normal  mammograms  was  used  to  estimate  the  FP  marker  rates  during  testing.  When  the 
single  CAD  system  trained  on  the  average  mass  set  was  applied  to  the  test  set  with  average  masses, 
the  FP  marker  rates  were  2.2,  1.8,  and  1.5  per  image  at  the  case-based  sensitivities  of  90%,  85%, 
and  80%,  respectively.  With  the  dual  CAD  system,  the  FP  marker  rates  were  reduced  to  1.2,  0.9, 
and  0.7  per  image,  respectively,  at  the  same  case-based  sensitivities.  Statistically  significant  (p 
<  0.05)  improvements  on  the  free  response  receiver  operating  characteristic  curves  were  observed 
when  the  dual  system  and  the  single  system  were  compared  using  the  test  sets  with  either  average 
masses  or  subtle  masses.  ©  2006  American  Association  of  Physicists  in  Medicine. 
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I.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  cancer  mortality 
among  women. 1  It  has  been  reported  that  early  diagnosis  and 
treatment  can  significantly  improve  the  chance  of  survival 
for  patients  with  breast  cancer."-4  At  present,  the  most  suc¬ 
cessful  method  for  the  early  detection  of  breast  cancer  is 
screening  mammography.5  Various  methods  are  being  devel¬ 
oped  to  improve  the  accuracy  of  breast  cancer  detection. 
Double  reading  by  radiologists  can  reduce  the  miss  rate  of 
radiographic  reading.  However,  double  reading  will  increase 
the  cost  of  mammographic  screening.  An  alternative  method 
is  to  use  a  trained  computer-aided  detection  (CAD)  system  as 
a  second  reader.6'7  Recent  clinical  studies  have  shown  that 
CAD  systems  are  helpful  for  increasing  radiologists’  accu¬ 
racy  in  detecting  breast  cancers.8-13 

A  large  volume  of  literature  has  been  published  in  the 
CAD  area.  CAD  systems  for  mammography  generally  con¬ 
sist  of  two  subsystems:  one  is  a  mass  detection  system  and 
the  other  is  a  microcalcification  detection  system.  Detection 
of  masses  on  mammograms  is  often  more  challenging  than 


detection  of  microcalcifications.  The  mass  detection  systems 
to-date  have  employed  a  single-system  approach  using  vari¬ 
ous  techniques  for  prescreening  of  mass  candidates  and  clas¬ 
sification  of  true  and  false  positives.14-24  Our  laboratory  in¬ 
corporated  two-view  mammographic  information  for 
improved  differentiation  of  true  masses  and  false  positives 
and  obtained  promising  preliminary  results.""  However,  de¬ 
velopment  of  new  methods  to  improve  the  performance  of 
mass  detection  systems  remains  an  important  area  of  CAD 
research. 

The  CAD  systems  developed  so  far  have  mostly  used 
masses  seen  on  current  mammograms  (i.e.,  the  mammo¬ 
grams  on  which  the  masses  were  detected  by  radiologists) 
for  training.  An  important  purpose  of  a  CAD  system  is  that  it 
is  used  as  a  second  reader  to  alert  radiologists  to  subtle  can¬ 
cers  that  may  be  overlooked.  To  study  the  ability  of  a  CAD 
system  in  detecting  subtle  cancers  that  are  likely  to  be 
missed  by  radiologists,  one  way  is  to  evaluate  its  accuracy  in 
detecting  missed  cancers  on  prior  mammograms  (i.e.,  the 
mammograms  in  previous  examinations  on  which  the  mass 
or  cancer  can  be  seen  retrospectively  but  was  considered 
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negative  or  benign  at  the  time  of  the  examination).  Some 
researchers  have  investigated  the  performance  change  of 
CAD  systems  when  using  prior  mammograms  as  input.  In 
our  study  of  mass  detection  on  prior  mammograms,  we 
obtained  a  case-based  sensitivity  of  74%  (20/27)  of  the  ma¬ 
lignant  masses  with  2.2  false  positives  (FPs)  per  image,  te 
Brake  et  al.J'  reported  that  their  CAD  system  has  a  case- 
based  sensitivity  of  34%  (22/65)  of  the  cancers  which  have 
the  appearance  of  masses  or  stellate  lesions  in  the  prior  ex¬ 
aminations  with  1  FP  per  image.  A  commercial  system  (R2 
ImageChecker)  also  reported  detection  of  42%  (72/172)  of 
the  cancers  in  the  prior  years  which  were  considered  worthy 
of  call-back  in  retrospect  by  expert  mammographers  with 
about  2  FP  marks/case.  Zheng  et  al.  reported  that  their 
CAD  system  trained  with  current  mammograms  could  not 
perform  optimally  in  prior  mammograms  and  vice  versa; 
whereas  the  same  system  trained  with  prior  mammograms 
can  perform  better  on  detecting  the  masses  on  prior  mammo- 
grams.  Recently,  an  assessment  study  was  conducted  to 
compare  the  performance  of  two  commercial  systems  and 
one  research  CAD  system  on  current  mammograms  and 
prior  mammograms.  The  results  showed  that  the  true  positive 
(TP)  fraction  for  CAD  systems  on  prior  mammograms  of  39 
breasts  with  malignant  masses  ranged  from  15%  to  26%  with 
0.28  to  0.41  FP  marks/image.  Although  the  detection  perfor¬ 
mance  reported  in  the  different  studies  vary,  probably  due  to 
the  differences  in  the  data  set  used,  these  studies  indicate  that 
the  sensitivities  of  current  CAD  systems  in  detecting  subtle 
masses  on  prior  mammograms  are  substantially  lower  than 
that  obtained  from  detection  on  current  mammograms.  The 
difficulty  in  recognizing  the  subtle  and  possibly  different  fea¬ 
tures  of  the  masses  on  priors  compared  to  those  of  the 
masses  on  current  mammograms  may  be  one  of  the  factors 
that  causes  oversight  for  both  radiologists  and  the  CAD  sys¬ 
tems. 

The  goal  of  pattern  recognition  is  to  achieve  the  best  pos¬ 
sible  classification  performance  in  the  task  at  hand.  Re¬ 
searchers  had  shown  that,  for  a  class  of  objects  with  a  wide 
range  of  characteristics,  the  classification  performance  can  be 
improved  by  using  combination  of  classifiers  whereby  ob¬ 
jects  of  certain  characteristics  are  classified  by  one  classifier 
using  a  set  of  features  and  objects  of  different  characteristics 
by  another  classification  scheme  based  on  different 
features.  The  advantage  of  using  combination  of  classi¬ 
fiers  is  that  it  may  stabilize  the  training  of  classifiers  even 
with  a  relatively  small  sample  size  because  each  classifier 
does  not  have  to  accommodate  a  wide  range  of  characteris¬ 
tics  and  features.36’37  These  observations  motivated  our  inter¬ 
est  in  the  design  of  a  dual  CAD  system  for  mass  detection. 

Since  the  missed  cancers  on  prior  mammograms  represent 
the  difficult  cases  that  are  more  likely  to  be  missed  by  radi¬ 
ologists  if  similar  cancers  occur  on  screening  mammograms, 
it  is  important  to  improve  the  sensitivity  of  the  CAD  system 
in  detecting  these  cancers.  On  the  other  hand,  when  a  CAD 
system  is  applied  to  a  new  mammogram  in  clinical  practice, 
it  has  to  detect  breast  lesions  of  all  degrees  of  subtlety  effec¬ 
tively.  However,  it  is  difficult  to  train  a  single  CAD  system  to 


provide  optimal  detection  for  all  lesions  over  the  entire  spec¬ 
trum  of  subtlety  because  the  classifiers  have  to  make  com¬ 
promises  to  accommodate  cancers  of  a  wide  range  of  char¬ 
acteristics.  Therefore,  we  have  been  exploring  a  new  dual 
CAD  system  approach  that  combines  a  CAD  system  trained 
with  retrospectively  seen  masses  on  prior  mammograms  with 
a  CAD  system  trained  with  masses  detected  on  current 
mammograms.  ’  In  this  paper,  we  will  describe  the  design 
of  the  dual  CAD  system  and  report  our  current  results. 

II.  MATERIALS  AND  METHOD 
A.  Data  sets 

All  mammograms  in  this  study  were  collected  from  pa¬ 
tient  files  in  the  Department  of  Radiology  at  the  University 
of  Michigan  with  Institutional  Review  Board  (1RB)  ap¬ 
proval.  The  mammograms  were  digitized  with  a  LUMISYS 
85  laser  film  scanner  with  a  pixel  size  of  50  /am  X  50  /rm 
and  4096  gray  levels.  The  scanner  was  calibrated  to  have  a 
linear  relationship  between  gray  levels  and  optical  densities 
(O.D.)  from  0.1  to  greater  than  3  O.D.  units.  The  nominal 
O.D.  range  of  the  scanner  is  0-4.  The  full  resolution  mam¬ 
mograms  were  first  smoothed  with  a  2  X  2  box  filter  and 
subsampled  by  a  factor  of  2,  resulting  in  100  /am 
X  100  /xm  images.  The  images  at  a  pixel  size  of  100  jam 
X  100  /am  were  used  for  the  input  of  our  CAD  system. 

We  collected  three  data  sets.  The  first  data  set  contained 
115  cases  with  confirmed  masses.  Each  case  included  the 
current  mammograms  that  prompted  the  radiologist  to  work 
up  the  mass.  This  is  referred  to  as  the  “average”  mass  set.  All 
of  the  cases  in  the  average  mass  set  had  two  mammographic 
views:  the  craniocaudal  view  and  the  mediolateral  oblique 
view  or  the  lateral  view,  thus  yielding  a  total  of  230  mam¬ 
mograms.  There  were  115  masses  (67  malignant  masses  and 
48  benign  masses)  in  this  data  set,  of  which  105  were 
biopsy-proven  and  10  were  determined  to  be  benign  by  long¬ 
term  follow-up. 

The  second  data  set  was  composed  of  the  prior  mammo¬ 
grams  dated  one  to  two  years  earlier  than  the  mammograms 
of  the  same  patients  in  the  average  mass  set.  Since  the 
masses  on  prior  mammograms  are  on  average  subtler  than 
those  on  current  mammograms,  this  data  set  is  referred  to  as 
the  “subtle”  mass  set.  On  5  of  the  115  patients,  no  mass  or 
focal  density  could  be  identified  on  either  view  of  the  prior 
mammograms.  Therefore,  the  subtle  mass  set  was  composed 
of  110  cases  (62  malignant  and  48  benign).  For  the  purpose 
of  training  the  subtle  mass  detection  system,  the  subtle 
masses  do  not  have  to  be  obtained  from  the  same  cases  as  the 
average  mass  set  but  we  used  the  available  prior  mammo¬ 
grams  for  these  mass  cases  in  our  database.  Nineteen  of  the 
110  cases  had  two  prior  mammogram  examinations.  Of  the 
129  examinations  in  the  subtle  mass  set,  123  had  two  mam¬ 
mographic  views  and  6  had  three  views,  with  a  total  of  264 
mammograms.  Many  of  the  subtle  masses  on  the  prior  mam¬ 
mograms  could  be  identified  only  as  a  focal  density  corre¬ 
sponding  to  the  location  of  the  subsequently  detected  mass 
on  the  current  mammograms.  On  44  of  the  two-view  prior 
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Table  I.  Description  of  cases  in  the  average  and  subtle  mass  data  sets  and 
the  subsets  for  training  and  testing  in  the  cross-validation  scheme. 


Mass  subset  1 

Mass  subset  2 

Average 
mass  subset 

Subtle 

mass  subset 

Average 
mass  subset 

Subtle 

mass  subset 

Total  No.  of  cases 

57 

54 

58 

56 

Cases  with  two 
prior  examinations 

NA 

10 

NA 

9 

Exams  with  two 

views 

57 

58 

58 

65 

Exams  with  three 

views 

0 

6 

0 

0 

Total  No. 
of  images 

114 

134 

116 

130 

No.  of  negative 
images 

0 

25 

0 

19 

No.  of  mass  images 
for  training 

114 

109 

116 

111 

No.  of  two-view 
pairs  for  testing 

57 

64 

58 

65 

No.  of  images  for 
testing 

114 

128 

116 

130 

No.  of  malignant 

masses 

36 

33 

31 

29 

No.  of  benign 

masses 

21 

21 

27 

27 

mammograms,  the  mass  location  was  evident  only  on  one 
view.  Table  I  summarizes  the  information  for  the  average  and 
subtle  mass  subsets. 

The  third  data  set  was  composed  of  260  normal  bilateral 
two-view  mammograms  obtained  from  65  patients.  No 
masses  were  evident  on  these  mammograms  upon  review  by 
the  experienced  radiologist. 

The  two  mass  data  sets  were  used  to  estimate  the  detec¬ 
tion  sensitivity  and  the  normal  data  set  was  used  for  estimat¬ 
ing  the  FP  marker  rate.  For  the  mass  data  sets,  the  tme  loca¬ 
tions  of  the  masses  were  identified  by  an  experienced  MQSA 
radiologist  using  all  available  imaging  and  clinical  informa¬ 
tion.  The  radiologist  also  provided  an  estimate  of  the  longest 
diameter  of  the  mass,  descriptors  of  its  margin  and  shape,  a 
visibility  rating,  and  an  estimate  of  the  breast  density  in 
terms  of  Bl-RADS  category.  Figure  1  shows  the  distributions 
of  mass  sizes,  mass  shapes,  mass  margins,  and  their  visibility 
on  a  10-point  rating  scale  with  1  representing  the  most  vis¬ 
ible  masses  and  10  the  most  difficult  case  relative  to  the 
cases  seen  in  their  clinical  practice.  The  masses  had  a  mean 
of  13.7  mm  and  a  median  of  12  mm  in  the  average  data  set 
and  a  mean  of  9.7  mm  and  a  median  of  10  mm  in  the  subtle 
data  set.  Figure  2  shows  the  breast  density  for  both  the  nor¬ 
mal  data  set  and  the  mass  data  sets.  As  can  be  seen  from  the 
distributions  of  the  mass  characteristics,  the  average  masses 
on  the  current  mammograms  and  the  subtle  masses  on  the 
priors  had  large  overlap.  Nevertheless,  on  average,  the  subtle 
masses  were  smaller  in  size  and  less  conspicuous  on  the 
mammograms. 


B.  Methods 

In  order  to  improve  the  sensitivity  of  detecting  breast  le¬ 
sions  of  all  degrees  of  subtlety,  we  developed  a  new  dual 
system  approach  which  combines  a  system  trained  with  av¬ 
erage  masses  with  another  system  trained  with  subtle  masses. 
When  the  trained  dual  system  is  applied  to  an  unknown 
mammogram,  the  two  CAD  systems  are  used  in  parallel  to 
detect  suspicious  objects  on  a  single  mammogram.  No  prior 
mammogram  is  needed.  The  additional  FPs  from  the  use  of 
the  two  systems  are  reduced  by  an  information  fusion  stage. 
We  will  refer  to  the  two  systems  separately  trained  with  the 
average  masses  and  the  subtle  masses  as  “single”  CAD  sys¬ 
tems  in  the  following  discussions. 

We  randomly  separated  the  mass  data  sets  by  case  into 
two  independent  subsets.  Both  the  average  and  subtle  mass 
subsets  followed  the  same  case  grouping  so  that  mammo¬ 
grams  from  the  same  case  would  not  be  separated  into  the 
training  subset  for  one  single  CAD  system  and  the  test  subset 
for  the  other  single  CAD  system  in  a  cross-validation  cycle. 
Table  I  shows  the  subsets  of  cases  in  the  average  and  subtle 
mass  data  sets.  Two-fold  cross  validation  was  used  for  train¬ 
ing  and  testing  the  algorithms.  The  training  included  select¬ 
ing  proper  parameters  for  each  single  CAD  system  and  for 
information  fusion.  Once  the  training  with  one  mass  subset 
was  completed,  the  parameters  were  fixed  for  testing  with  the 
other  mass  subset.  The  training  and  test  mass  subsets  were 
switched  and  the  training  and  test  processes  were  repeated. 
The  CAD  systems  were  trained  with  single  mammograms. 
To  maximize  the  number  of  training  images  with  masses,  all 
images  with  a  visible  mass  were  included  regardless  of 
whether  they  were  a  part  of  a  two-view  or  three-view  case 
when  the  subtle  mass  subset  was  used  as  a  training  set.  How¬ 
ever,  when  the  subtle  mass  subset  was  used  as  a  test  set,  only 
two  views  were  included  for  each  case  because  we  used  two- 
view  mammograms  to  derive  the  case-based  test  perfor¬ 
mance.  For  cases  containing  three  views,  we  therefore  in¬ 
cluded  only  two  of  the  views  in  testing.  We  also  included 
cases  with  the  mass  visible  on  only  one  of  the  two  views. 
After  the  two-fold  cross  validation  testing,  the  overall  detec¬ 
tion  performance  was  evaluated  by  combining  the  perfor¬ 
mances  of  the  two  test  subsets.  The  trained  algorithms  with 
the  fixed  parameters  were  also  applied  to  the  normal  set  of 
mammograms,  which  was  not  used  during  training,  to  esti¬ 
mate  the  FP  rate  in  screening  mammograms. 


1.  Single  CAD  system  overview 

The  major  steps  in  the  two  single  mass  detection  systems 
are  similar  but  the  feature  spaces  and  classifiers  for  FP  re¬ 
duction  in  each  system  were  designed  separately  to  suit  the 
characteristics  of  average  and  subtle  masses,  respectively. 
The  two  systems  are  therefore  described  together  in  the  fol¬ 
lowing  but  the  differences  will  be  pointed  out  whenever  ap¬ 
plicable.  Each  single  CAD  system  consists  of  four  process¬ 
ing  steps:  (1)  prescreening  of  mass  candidates,  (2) 
segmentation  of  suspicious  objects,  (3)  feature  extraction  and 
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Fig.  1.  The  characteristics  of  the  masses  in  our  mass  data  set:  (a)  distribution  of  mass  sizes,  (b)  distribution  of  mass  visibility  on  a  10-point  rating  scale  with 
1  representing  the  most  visible  masses  and  10  the  most  subtle  masses  relative  to  the  cases  seen  in  clinical  practice,  (c)  distribution  of  mass  shapes,  (d) 
distribution  of  mass  margins,  C:  circumscribed,  Ind:  indistinct,  M:  microlobulated,  Ob:  obscured,  Sp:  spiculated. 


analysis,  and  (4)  FP  reduction  by  classification  of  normal 
tissue  structures  and  masses.  The  block  diagram  for  the  de¬ 
tection  scheme  is  shown  in  Fig.  3. 

For  the  prescreening  stage,  we  have  developed  a  two- 
stage  gradient  field  analysis  method  which  not  only  uses  the 
shape  information  of  masses  on  mammograms  but  also  in¬ 
corporates  the  gray  level  information  of  the  local  object  seg- 


12  3  4 


mented  by  a  region  growing  technique  in  the  second  stage  to 
refine  the  gradient  field  analysis.-4'40  Locations  of  high  radial 
gradient  convergence  are  labeled  as  mass  candidates.  After 
prescreening,  the  suspicious  objects  are  identified  by  using  a 
two-stage  segmentation  method.41  First,  the  background- 


Breast  Density 

Fig.  2.  The  distribution  of  breast  density  in  terms  of  BI-RADS  categories 
estimated  by  an  MQSA  radiologist. 


Fig.  3.  Schematic  diagram  of  our  single  CAD  system  for  mass  detection. 
The  FP  classification  stage  includes  rule-based  classification,  a  morphologi¬ 
cal  LDA  classifier,  and  a  texture  feature  LDA  classifier  for  differentiating 
masses  from  normal  breast  tissues. 
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corrected  ROI  is  weighted  by  a  two-dimensional  Gaussian 
function  with  cr=256  pixels  to  enhance  the  central  region. 
Sobel  filtering  is  then  applied  to  the  Gaussian-weighted  ROI 
to  generate  another  enhanced  image.  Second,  a  k-means  clus¬ 
tering  using  the  pixel  values  from  these  two  images  as  fea¬ 
tures  is  used  to  segment  the  object.  For  each  suspicious  ob- 
ject,  eleven  morphological  features  were  extracted.  Rule- 
based  and  linear  discriminant  classifiers  were  trained  by 
using  the  training  data  set  only  to  remove  the  detected  struc¬ 
tures  that  were  substantially  different  from  breast  masses. 
For  the  system  trained  with  average  masses,  global  and  local 
multiresolution  texture  analysis4-  were  performed  in  each 
ROI  by  using  the  spatial  gray  level  dependence  (SGLD)  ma¬ 
trices.  A  total  of  364  features  were  extracted  from  global 
texture  analysis.  Local  texture  features  were  extracted  from 
the  local  region  containing  the  detected  object  and  the  pe¬ 
ripheral  regions  within  each  ROI.  A  total  of  208  features 
were  extracted  for  local  texture  analysis.  For  the  system 
trained  with  subtle  masses,  instead  of  the  SGLD  texture  fea¬ 
tures,  gray  level  features  and  run  length  statistics  analysis 
(RLS)  texture  features43  were  extracted  inside  and  outside  of 
each  mass  region  on  the  original  image  and  gradient  field 
image.  The  gray  level  features  included  the  contrast  of  the 
object  relative  to  the  surrounding  background,  the  minimum 
and  the  maximum  gray  levels,  and  the  characteristics  derived 
from  the  gray  level  histogram  in  the  regions  inside  and  out¬ 
side  of  each  object  including  skewness,  kurtosis,  energy,  and 
entropy.  Five  RLS  texture  features  were  extracted  in  both  the 
horizontal  and  vertical  directions:  short  runs  emphasis,  long 
runs  emphasis,  gray  level  nonuniformity,  run  length  nonuni¬ 
formity,  and  run  percentage.  A  total  of  66  features  were  ex¬ 
tracted  for  the  system  trained  with  subtle  masses. 

In  order  to  obtain  the  best  texture  feature  subset  and  also 
reduce  the  dimensionality  of  the  feature  space  to  design  an 
effective  classifier,  stepwise  feature  selection  with  linear  dis¬ 
criminant  analysis  (LDA)  was  applied  to  the  training  subset. 
The  detailed  procedure  has  been  described  elsewhere.-4'44'45 
Briefly,  at  each  step  one  feature  was  entered  or  removed 
from  the  feature  pool  by  analyzing  its  effect  on  the  selection 
criterion,  which  was  chosen  to  be  the  Wilks’  lambda  in  this 
study.  Since  the  appropriate  values  of  thresholds  for  feature 
entry,  feature  elimination,  and  tolerance  of  correlation  for 
feature  selection  were  unknown,  we  used  an  automated  sim¬ 
plex  optimization  method  to  search  for  the  best  combination 
of  thresholds  in  the  parameter  space.  The  simplex  algorithm 
used  a  leave-one-case-out  resampling  method  within  the 
training  subset  to  select  features  and  estimate  the  weights  for 
the  LDA  classifier.  To  have  a  figure-of-merit  to  guide  feature 
selection,  the  test  discriminant  scores  from  the  left-out  cases 
were  analyzed  using  receiver  operating  characteristic  (ROC) 
methodology.46  The  accuracy  for  classification  of  masses  and 
FPs  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  In 
this  approach,  feature  selection  was  performed  without  the 
left-out  case  so  that  the  test  performance  would  be  less  op¬ 
timistically  biased.47  However,  the  selected  feature  set  in 
each  leave-one-case-out  cycle  could  be  slightly  different  be¬ 
cause  every  cycle  had  one  training  case  different  from  the 
other  cycles.  In  order  to  obtain  a  single  trained  classifier  to 


Fig.  4.  Schematic  diagram  of  proposed  dual  CAD  system  for  mass  detec¬ 
tion.  BP-ANN  is  used  for  information  fusion. 

apply  to  the  independent  test  subset,  a  final  stepwise  feature 
selection  was  performed  with  the  best  combination  of  thresh¬ 
olds,  found  in  the  simplex  optimization  procedure,  on  the 
entire  training  subset  to  obtain  the  final  set  of  features  and 
estimate  the  weights  of  the  LDA.  Note  that  the  entire  process 
of  feature  selection  and  classifier  weight  estimation  was  per¬ 
formed  within  the  training  subset.  The  LDA  classifier  with 
the  selected  feature  set  was  then  fixed  and  applied  to  the 
independent  test  subset.  The  training  and  testing  processes 
were  performed  independently  for  the  two-fold  cross- 
validation  sets. 

2.  Training  and  test  for  dual  system 

The  block  diagram  for  the  dual  system  is  shown  in  Fig.  4. 
During  the  training  of  the  dual  system,  we  used  the  current 
and  prior  mammograms  from  the  same  patients.  The  current 
mammograms  that  contained  the  average  masses  were  only 
used  to  train  the  first  single  CAD  system.  The  prior  mammo¬ 
grams  that  contained  the  subtle  masses  were  only  used  to 
train  the  second  single  CAD  system.  The  prescreening  and 
the  segmentation  steps  in  the  two  systems  are  identical. 
Since  the  morphological  appearances  of  average  and  subtle 
masses  are  different,  the  rules  in  the  morphological  rule- 
based  FP  classification  are  trained  differently  for  the  two 
single  CAD  systems.  During  testing  with  an  independent 
mammogram,  the  dual  system  keeps  all  the  suspicious  ob¬ 
jects  that  satisfy  the  FP  classification  rules  of  either  single 
CAD  system  and  applies  the  LDA  classifiers  from  both 
single  systems  to  each  object.  Each  object  thus  has  two  LDA 
scores. 

To  merge  the  information  from  the  two  CAD  systems,  a 
fusion  scheme  was  developed  for  our  dual  system.  In  this 
study,  a  feed-forward  backpropagation  artificial  neural  net¬ 
work  (BP-ANN)  was  trained  to  classify  the  masses  from  nor¬ 
mal  tissues  by  combining  the  output  information  from  the 
two  single  CAD  systems.  The  LDA  classifiers  from  the  two 
single  CAD  systems  were  applied  to  each  detected  object. 
The  two  LDA  discriminant  scores  for  each  object  were  used 
as  input  to  the  BP-ANN.  The  BP-ANN  had  an  input  layer 
with  two  nodes,  a  hidden  layer  with  N  nodes,  and  an  output 
layer  with  one  node.  The  nodes  were  interconnected  by 
weights  and  information  propagated  from  one  layer  to  the 
next  through  a  log-sigmoidal  activation  function.  The  learn¬ 
ing  of  the  ANN  was  a  supervised  process  in  which  known 
training  cases  were  input  to  the  ANN.  The  performance  func- 
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tion  for  the  network  was  the  mean-squared  error  between  the 
network  outputs  and  the  target  outputs.  The  weights  of  the 
network  were  adjusted  iteratively  by  a  feedforward  back- 
propagation  procedure  to  minimize  the  error.  Detailed  de¬ 
scription  of  the  backpropagation  neural  network  can  be 
found  in  the  literature.48'49 

To  choose  the  number  of  hidden  nodes  (N)  in  the  BP- 
ANN,  we  used  a  three-fold  cross-validation  method  within 
the  training  subset.  We  randomly  separated  the  entire  training 
subset  including  all  detected  objects  into  three  independent 
groups.  The  objects  belonging  to  the  same  case  were  sepa¬ 
rated  into  the  same  group.  For  a  given  N,  three  training 
cycles  were  performed,  in  each  of  which  two  of  the  three 
groups  were  used  to  train  the  BP- ANN  and  the  left-out  group 
was  used  to  test  its  performance.  The  A,  value  obtained  from 
the  ANN  output  scores  for  the  test  group  was  used  as  the 
performance  index  for  that  training  cycle.  The  average  of  the 
Az  values  from  the  three  test  groups  represented  the  perfor¬ 
mance  of  the  BP-ANN  with  N  hidden  nodes.  In  our  experi¬ 
ment,  a  BP-ANN  with  3  hidden  nodes  provided  the  largest 
average  A,  value  and  was  therefore  chosen.  The  weights  of 
the  chosen  BP-ANN  were  retrained  with  the  entire  training 
subset.  The  BP-ANN  with  the  trained  weights  was  used  to 
merge  the  information  from  the  two  single  CAD  systems. 

To  test  the  dual  system,  the  two  trained  single  CAD  sys¬ 
tems,  one  trained  with  the  average  mass  set  and  the  other 
with  the  subtle  mass  set,  were  applied  in  parallel  to  each 
single  “unknown”  mammogram  in  the  independent  test  sub¬ 
set.  No  prior  mammogram  was  needed  during  testing. 

3.  Evaluation  methods 

The  detected  individual  objects  were  compared  with  the 
“truth”  ROI  marked  by  the  experienced  radiologist,  as  de¬ 
scribed  earlier.  A  detected  object  was  scored  as  TP  if  the 
overlap  between  the  bounding  box  of  the  detected  object  and 
the  bounding  box  of  the  true  mass  relative  to  the  larger  of  the 
two  bounding  boxes  was  over  25%.  Otherwise,  it  would  be 
scored  as  FP.  The  25%  threshold  was  selected  as  described  in 
our  previous  study."1 

The  FP  marker  rate  was  estimated  in  two  ways:  one  from 
detection  on  the  same  test  subsets  with  masses,  the  other 
from  detection  on  the  normal  data  set  of  negative  mammo¬ 
grams.  For  the  latter,  we  applied  the  trained  dual  CAD  sys¬ 
tem  to  the  normal  data  set.  The  number  of  FP  marks  pro¬ 
duced  by  the  CAD  system  was  determined  by  counting  the 
detected  objects  on  the  normal  cases.  The  mass  detection 
sensitivity  was  determined  by  counting  the  detected  masses 
on  the  test  mass  subset.  The  detection  performance  of  the 
CAD  system  was  assessed  by  free  response  ROC  (FROC) 
analysis.  A  FROC  curve  was  obtained  by  plotting  the  mass 
detection  sensitivity  as  a  function  of  FP  marks  per  image 
either  obtained  from  the  mass  data  subset  or  the  normal  set  at 
the  corresponding  decision  threshold. 

FROC  curves  were  presented  on  a  per-mammogram  and  a 
per-case  basis.  For  image-based  FROC  analysis,  the  mass  on 
each  mammogram  was  considered  an  independent  true  ob¬ 
ject.  For  case-based  FROC  analysis,  the  same  mass  imaged 
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Fig.  5.  An  example  of  a  scatter  plot  of  the  LDA  scores  from  the  two  single 
CAD  systems  which  are  used  as  input  to  the  BP-ANN.  The  correlation 
coefficient  between  the  scores  of  two  LDA  classifiers  is  0.46,  indicating  that 
the  two  LDA  scores  are  essentially  independent  features. 


on  the  two-view  mammograms  was  considered  to  be  one  true 
object  and  detection  of  either  or  both  masses  on  the  two 
views  was  considered  to  be  a  TP  detection. 

Since  we  used  two-fold  cross  validation  method  for  train¬ 
ing  and  testing,  we  obtained  two  test  FROC  curves,  one  for 
each  test  subset,  for  each  of  the  conditions  (e.g.,  single  CAD 
system  approach  or  dual  system  approach).  To  summarize 
the  results  for  comparison,  an  average  test  FROC  curve  was 
derived  by  averaging  the  FP  rates  at  the  same  sensitivity 
along  the  FROC  curves  of  the  two  corresponding  test  sub¬ 
sets. 

In  order  to  compare  the  performance  of  the  single  CAD 
system  and  the  dual  CAD  system,  we  applied  the  alternative 
free-response  ROC  (AFROC)  method  and  the  jackknife  free- 
response  ROC  (JAFROC)  method  developed  by  Chakraborty 
et  al. 50,51  to  the  pairs  of  FROC  curves.  In  the  AFROC 
method,  the  FROC  data  are  first  transformed  by  counting  the 
number  of  false-positive  images  (FPIs)  instead  of  the  FPs  per 
image.  The  confidence  rating  of  a  FPI  is  determined  by  the 
highest  confidence  FP  decision  on  the  image  regardless  of 
how  many  lower  confidence  FP  decisions  are  made  on  the 
same  image.  The  ROCKIT  curve  fitting  software  and  statistical 
significance  tests  for  ROC  analysis  developed  by  Metz  et 
al.4b  can  then  be  used  to  analyze  the  AFROC  data. 


III.  RESULTS 

Figure  5  shows  an  example  of  the  two-dimensional  fea¬ 
ture  space  that  was  used  as  the  input  to  the  BP-ANN  being 
trained  to  merge  the  information  from  the  two  single  CAD 
subsystems.  The  two  features  are  the  output  scores  of  the 
LDA  classifiers  trained  with  the  average  masses  and  with  the 
subtle  masses.  The  correlation  coefficients  of  the  two  fea¬ 
tures  are  0.46  and  0.44  for  each  of  the  training  subsets,  re¬ 
spectively.  The  low  correlation  indicated  that  the  two  single 
CAD  systems  extracted  relatively  independent  features  from 
the  object.  The  Az  values  of  the  chosen  ANN  were  0.92  +  0.01 
and  0.87+0.01,  respectively,  as  estimated  by  validation  in 
the  training  process.  The  ANN  classifiers  achieved  Az  values 
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Fig.  6.  The  test  ROC  curves  for  the  BP-ANN  classifiers  from  the  two  in¬ 
dependent  mass  subsets.  The  ANN  classifiers  achieved  an  A.  value  of 
0.90+0.02  for  test  subset  1  and  0.89+0.01  for  test  subset  2  in  the  classifi¬ 
cation  of  mass  and  normal  breast  tissues. 


of  0.90+  0.02  and  0.89+0.01  on  the  two  independent  test 
subsets,  respectively.  Figure  6  shows  the  ROC  curves  for  the 
two  test  subsets. 

In  order  to  evaluate  the  effectiveness  of  our  dual  system 
approach,  we  compared  its  performance  on  the  test  subsets 
containing  average  masses  with  two  other  single  CAD  sys¬ 
tems:  the  CAD  system  trained  only  on  the  average  mass  set 
and  the  CAD  system  trained  on  both  the  average  and  the 
subtle  mass  sets.  When  a  single  CAD  system  was  trained 
only  with  the  average  masses,  the  number  of  selected  fea¬ 
tures  was  21  (14  global  and  7  local)  and  16  (10  global  and  6 
local)  texture  features  for  the  two  independent  training  sub¬ 
sets,  respectively.  When  the  CAD  system  was  trained  with 
both  the  average  and  the  subtle  masses,  the  number  of  se¬ 
lected  features  was  17  (11  global  and  6  local)  and  18  (7 
global  and  11  local)  texture  features  for  the  two  independent 
training  subsets,  respectively. 

For  the  dual  system,  the  single  system  trained  with  the 
average  masses  was  the  same  as  that  described  earlier.  For 
the  single  system  trained  with  subtle  masses,  four  (2  gray 
level  and  2  RLS  texture)  and  five  (3  gray  level  and  2  RLS 
texture)  features  were  selected  for  the  two  independent  train¬ 
ing  subsets,  respectively. 

The  average  test  FROC  curves  of  the  dual  CAD  system 
on  the  test  subsets  with  average  masses  were  compared  to 
those  of  the  single  CAD  systems  in  Fig.  7.  The  FP  rates  were 
estimated  from  the  mass  data  set.  The  dual  CAD  system 
achieved  a  case-based  sensitivity  of  80%,  85%,  and  90%  at 
0.6,  0.8,  and  1.0  FPs/image,  respectively,  compared  with  1.3, 
1.5,  and  1.8  FPs/image  on  the  single  CAD  system  trained 
with  average  masses  alone.  The  performance  of  the  single 
CAD  system  trained  with  both  the  average  masses  and  the 
subtle  masses  was  comparable  to  that  trained  with  average 
masses  alone,  with  FP  rates  of  1.4,  1.6,  and  1.8  FPs/image  at 
the  same  sensitivities,  respectively.  Figure  8  shows  the  com¬ 
parison  of  the  three  average  test  FROC  curves,  similar  to 
those  shown  in  Fig.  7,  except  that  the  FP  rates  were  esti¬ 
mated  from  the  normal  data  set.  The  FP  rates  at  a  few  se¬ 
lected  sensitivities  for  the  dual  and  single  CAD  systems  were 
summarized  in  Table  II. 


Number  of  False  Positives  per  Image 

(a) 


Number  of  False  Positives  per  Image 

(b) 

Fig.  7.  Comparison  of  the  average  test  FROC  curves  obtained  from  aver¬ 
aging  the  FROC  curves  of  the  two  independent  average-mass  subsets.  Three 
CAD  systems  were  compared:  a  single  CAD  system  trained  with  average 
masses  alone,  a  single  CAD  system  trained  with  both  the  average  and  the 
subtle  masses,  and  the  dual  CAD  system.  The  FP  rate  was  estimated  from 
the  mammograms  with  masses,  (a)  Image-based  FROC  curves,  (b)  case- 
based  FROC  curves. 


In  this  study,  we  have  67  malignant  cases  in  the  average 
mass  set.  Figure  9  compares  the  average  test  FROC  curves  of 
the  single  CAD  system  and  the  dual  system  for  detection  of 
malignant  masses.  The  result  for  the  single  CAD  system 
trained  with  average  masses  was  shown  and  the  FP  rate  was 
estimated  from  the  mammograms  without  masses.  In  this 
case,  the  dual  CAD  system  achieved  a  case-based  sensitivity 
of  80%,  85%,  and  90%  at  0.6,  0.9,  and  1.2  FP  marks/image, 
respectively,  compared  with  1.1,  1.6,  and  2.0  FP  marks/ 
image  on  the  single  CAD  system. 

An  important  purpose  of  a  CAD  system  is  to  serve  as  a 
second  reader  to  alert  radiologists  to  subtle  cancers  that  may 
be  overlooked.  Figures  10  and  11  compare  the  average 
FROC  curves  of  the  single  CAD  system  and  the  dual  system 
for  detection  in  the  test  subsets  with  subtle  masses.  The  TP 
rate  in  Fig.  10  was  estimated  by  including  both  malignant 
and  benign  masses  and  that  in  Fig.  11  was  estimated  from 
malignant  masses  only.  The  single  CAD  system  trained  with 
average  masses  alone  was  used.  The  FP  rates  for  both  sys- 


Medical  Physics,  Vol.  33,  No.  11,  November  2006 


4164 


Wei  et  al.:  Dual  CAD  system  for  mammographic  mass  detection 


4164 


Number  of  False  Positives  per  Image 

(a) 


0.0  0.5  1.0  1.5  2.0  2.5 


Number  of  False  Positives  per  Image 

(a) 


0.0  0.5  1.0  1.5  2.0  2.5 

Number  of  False  Positives  per  Image 


(b) 


Fig.  8.  Comparison  of  the  average  test  FROC  curves  obtained  from  aver¬ 
aging  the  FROC  curves  of  the  two  independent  average-mass  subsets.  Three 
CAD  systems  were  compared:  a  single  CAD  system  trained  with  average 
masses  only,  a  single  CAD  system  trained  with  the  average  and  the  subtle 
masses,  and  the  dual  CAD  system.  The  FP  rate  was  estimated  from  the 
mammograms  without  masses,  (a)  Image-based  FROC  curves,  (b)  case- 
based  FROC  curves. 


Fig.  9.  Comparison  of  the  average  test  FROC  curves  of  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  malignant  masses  in  the 
average  data  set.  The  single  system  trained  with  average  masses  alone  was 
used  and  the  FP  rate  was  estimated  from  the  mammograms  without  masses, 
(a)  Image-based  FROC  curves,  (b)  case-based  FROC  curves. 


terns  were  estimated  from  the  mammograms  without  masses. 
The  dual  CAD  system  achieved  a  case-based  sensitivity  of 
50%  at  0.7  FP  marks/image  for  all  masses  and  at  0.5  FP 
marks/image  for  malignant  masses  only,  compared  with  1.4 


Table  II.  Comparison  of  case-based  detection  performance  between  the 
dual  system  and  the  single  CAD  system  trained  with  average  masses  alone. 
The  FP  marker  rates  were  estimated  from  detection  on  the  normal  data  set. 
The  FROC  curves  were  obtained  by  averaging  the  FROC  curves  of  the  two 
test  subsets. 


Average  mass  test  set 
(FP  marks/image) 

Subtle  mass  test  set 
(FP  marks/image) 

TP 

Single  system 

Dual  system 

Single  system 

Dual  system 

90% 

2.2 

1.2 

80% 

1.5 

0.7 

2.8 

70% 

1.0 

0.3 

2.4 

2.3 

60% 

0.5 

0.2 

1.8 

1.5 

50% 

0.3 

0.1 

1.4 

0.7 

FP  marks/image  for  all  masses  and  1.1  FP  marks/image  for 
malignant  masses  only  using  the  single  CAD  system. 

Table  II  summarizes  the  test  results  on  the  average  and 
subtle  mass  sets  for  the  dual  system  and  the  single  CAD 
system  trained  with  average  masses  at  different  sensitivity 
levels.  The  FP  marker  rates  were  estimated  from  the  detec¬ 
tion  on  the  normal  data  set. 

The  comparison  of  the  FROC  curves  for  the  dual  CAD 
system  and  the  single  CAD  system  in  terms  of  the  area  under 
the  fitted  AFROC  curve  (A[)  and  the  p  values  for  both  test 
subsets  with  average  masses  was  summarized  in  Table  III. 
The  differences  between  the  Al  values  for  the  two  systems 
were  statistically  significant  (/;<().  05).  The  fitted  AFROC 
curves,  however,  did  not  fit  very  well  to  the  transformed 
AFROC  data,  as  we  discussed  previously.-4  For  the  JAFROC 
method,  Chakraborty  et  al.  provided  software  to  estimate  the 
statistical  significance  of  the  difference  between  two  FROC 
curves.  The  comparison  of  the  figure-of-merit  (FOM)  and  the 
p  values  was  also  summarized  in  Table  III.  The  differences 
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Fig.  10.  Comparison  of  the  average  test  FROC  curves  for  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  the  subtle  masses  on  the 
prior  mammograms.  The  single  CAD  system  trained  with  average  masses 
alone  was  used  and  the  FP  rate  was  estimated  from  the  mammograms  with¬ 
out  masses,  (a)  Image-based  FROC  curves,  (b)  case-based  FROC  curves. 


Fig.  1 1 .  Comparison  of  the  average  test  FROC  curves  for  the  single  CAD 
system  and  the  dual  CAD  system  for  detection  of  subtle  malignant  masses 
on  the  prior  mammograms.  The  single  CAD  system  trained  with  average 
masses  alone  was  used  and  the  FP  rate  was  estimated  from  the  mammo¬ 
grams  without  masses,  (a)  Image-based  FROC  curves,  (b)  case-based  FROC 
curves. 


between  the  FOM  of  the  dual  CAD  system  and  that  of  the 
single  CAD  system  for  both  test  subsets  were  again  statisti¬ 
cally  significant  (p<0.05). 

The  comparison  of  Ab  the  FOM,  and  the  p  values  for  the 
dual  system  and  the  single  system  trained  with  average 
masses  in  detecting  subtle  masses  was  summarized  in  Table 
IV.  It  was  found  that  the  differences  between  the  results  of 
the  dual  CAD  system  and  those  of  the  single  CAD  system  on 
the  two  test  subsets  containing  subtle  masses  were  statisti¬ 
cally  significant  by  both  the  JAFROC  and  the  AFROC  meth¬ 
ods. 

IV.  DISCUSSION 

The  masses  on  prior  mammograms  are  more  subtle  and 
more  difficult  to  detect  than  the  masses  on  current  mammo¬ 
grams.  In  this  study,  we  developed  a  dual  CAD  system, 
which  combines  a  system  trained  with  masses  on  prior  mam¬ 
mograms  and  a  system  trained  with  masses  detected  on  cur¬ 
rent  mammograms.  We  have  demonstrated  that  this  dual  sys¬ 
tem  can  increase  the  accuracy  of  detecting  both  average 


masses  and  subtle  masses.  The  comparisons  of  the  dual  sys¬ 
tem  with  that  of  the  single  CAD  system  trained  with  average 
masses  alone  and  that  of  the  single  CAD  system  trained  with 
both  average  and  subtle  masses  (Fig.  7)  indicate  that  the  gain 
in  the  detection  accuracy  of  the  dual  system  could  not  be 
achieved  by  simply  using  a  larger  training  set  with  both  av¬ 
erage  and  subtle  masses.  In  fact,  it  is  interesting  to  note  that 
the  performance  of  the  single  CAD  system  trained  with  both 
the  average  and  the  subtle  masses  appeared  to  be  degraded 
slightly,  in  comparison  with  the  single  system  trained  with 
average  masses  alone,  when  it  was  applied  to  the  test  set  of 
average  masses.  The  decreased  performance  may  reflect  the 
compromise  made  when  the  single  CAD  system  was  trained 
to  accommodate  a  wide  range  of  lesion  characteristics.  Thus, 
the  dual  system  approach  may  have  improved  its  perfor¬ 
mance  through  other  factors,  including  the  flexibility  in  using 
different  feature  spaces  and  training  the  parameters  for  each 
type  of  masses  and  the  information  fusion  combining  the  two 
single  CAD  systems  effectively. 
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Table  III.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the  dual 
system  and  the  single  CAD  system  trained  with  average  masses  alone  when  the  systems  were  evaluated  on  the 
average  mass  test  subsets.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data  set  were 
compared. 


A,  (AFROC)  FOM  (JAFROC) 


All 

cases 

Malignant 

cases 

All 

cases 

Malignant 

cases 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Single 

0.45 

0.44 

0.47 

0.52 

0.48 

0.48 

0.53 

0.55 

system 

Dual 

0.55 

0.53 

0.58 

0.62 

0.60 

0.56 

0.63 

0.64 

system 
p  values 

0.0004 

0.0156 

0.0003 

0.0318 

<0.0001 

0.007 

0.0004 

0.0252 

For  the  comparison  of  the  different  systems,  we  analyzed 
the  false  negatives  (FNs)  of  the  single  CAD  systems  and  the 
dual  CAD  system  when  the  test  subsets  with  average  masses 
were  used.  It  was  found  that  the  FN  rates  of  the  single  CAD 
system  trained  with  average  masses,  the  single  CAD  system 
trained  with  subtle  masses,  and  the  dual  system  were  23.9% 
(55/230),  28.3%  (65/230),  and  16.5%  (38/230),  respec¬ 
tively,  after  FP  reduction  by  the  morphological  LDA  classi¬ 
fier  in  each  system.  Twenty-nine  masses  were  missed  by  both 
of  the  single  systems.  By  using  the  dual  system,  53  masses 
that  were  FNs  for  either  single  system  could  be  detected. 
However,  the  masses  that  were  missed  by  both  of  the  single 
CAD  systems  could  not  be  recovered  by  the  dual  CAD 
system. 

Our  motivation  of  this  study  is  to  improve  the  perfor¬ 
mance  of  a  CAD  system  for  mass  detection.  A  CAD  detec¬ 
tion  system  is  generally  intended  for  use  in  screening  mam¬ 
mography.  At  the  screening  stage,  all  lesions  of  concern 
should  be  pointed  out  to  radiologists  so  that  the  radiologists 
can  judge  if  a  recall  is  warranted.  If  a  detection  system  is 
trained  to  mark  only  the  malignant  lesions,  it  may  be  at¬ 
tempting  to  play  the  role  of  a  triage  system  (alerting  radiolo¬ 
gists  to  work  up  only  “malignant”  cases)  rather  than  that  of  a 
second  reader.  Furthermore,  since  computerized  lesion  detec¬ 
tion  or  characterization  on  mammograms  is  not  100%  sensi¬ 


tive,  it  will  be  confusing  to  the  radiologists  whether  an  un¬ 
marked  suspicious  lesion  is  missed  or  it  is  considered  benign 
by  the  computer.  We  believe  that  computer-aided  diagnosis 
(CADx)  may  be  used  in  different  ways  in  conjunction  with  a 
CAD  detection  system,  for  example,  the  likelihood  of  malig¬ 
nancy  may  be  estimated  by  the  CADx  system  and  displayed 
for  every  detected  lesion,  and/or  a  CADx  system  may  be 
used  during  diagnostic  workup.  Either  way  the  CAD  system 
will  first  alert  radiologists  to  all  masses,  leaving  the  assess¬ 
ment  of  malignancy  or  benignity  to  a  second  stage  and  with 
the  radiologist  being  the  primary  decision  maker.  The  train¬ 
ing  set  thus  included  both  malignant  and  benign  masses. 

For  a  CAD  system,  its  performance  for  detecting  malig¬ 
nant  masses  is  more  important  than  its  performance  for  de¬ 
tecting  all  masses.  The  FROC  curves  for  detection  of  malig¬ 
nant  masses  on  the  average  data  set  and  the  subtle  data  set, 
shown  in  Figs.  9  and  11,  respectively,  indicated  that  the  dual 
system  could  also  achieve  an  improvement  in  the  detection 
performance  over  that  of  the  single  system.  The  differences 
in  the  Aj  and  the  FOM  for  the  detection  of  malignant  cases  in 
the  average  and  subtle  mass  test  subsets  were  statistically 
significant,  as  shown  in  Tables  III  and  IV,  respectively. 

In  screening  mammography,  the  cancer  rate  is  3-5  per 
1000.  Most  of  the  mammograms  are  normal.  Therefore, 
some  CAD  researchers  and  users  estimate  the  FP  rate  using 


Table  IV.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the  dual 
system  and  the  single  CAD  system  trained  with  average  masses  alone  when  the  systems  were  evaluated  on  the 
subtle  mass  test  subsets.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data  set  were 
compared. 


Ay  (AFROC) 

FOM  (JAFROC) 

All  cases 

Malignant  cases 

All  cases 

Malignant  cases 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Test 

subset  1 

Test 

subset  2 

Single 

0.17 

0.20 

0.24 

0.25 

0.21 

0.23 

0.24 

0.26 

system 

Dual 

0.28 

0.25 

0.35 

0.34 

0.30 

0.28 

0.36 

0.34 

system 

p  values 

<0.0001 

0.046 

<0.0001 

0.0067 

0.0007 

0.048 

<0.0001 

0.0035 
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normal  mammograms5”-54  because  it  reflects  how  the  CAD 
system  performs  in  terms  of  specificity  and  whether  the  CAD 
system  may  cause  extra  efforts  for  radiologists  to  double 
check  the  marked  locations  or  unnecessary  recalls  in  a 
screening  setting.  Furthermore,  for  CAD  systems  that  set  a 
maximum  number  of  detected  objects  at  the  output,  estimat¬ 
ing  the  number  of  FPs  using  images  with  lesions  can  poten¬ 
tially  lead  to  an  optimistic  bias  for  the  FROC  curve  because 
one  of  the  detected  objects  will  likely  be  the  true  lesion.  The 
FP  rate  can  thus  be  underestimated  by  as  much  as  1  per 
image.  In  addition,  the  JAFROC  analysis  requires  that  the  FP 
rates  be  estimated  on  normal  images.  We  therefore  reported 
the  FP  rates  of  our  CAD  systems  on  both  mammograms  with 
masses  and  without  masses  to  facilitate  comparison  with 
other  CAD  systems  in  case  investigators  may  evaluate  their 
FP  rates  in  either  way. 

In  this  study,  we  evaluated  the  performance  of  the  trained 
CAD  systems  with  an  independent  test  set  using  the  two-fold 
cross  validation  method.  Although  the  selection  of  param¬ 
eters  and  features  was  performed  using  the  training  set,  we 
had  full  knowledge  of  the  performance  for  the  test  set  so  that 
the  selections  could  be  optimistically  biased.  True  indepen¬ 
dent  testing  will  have  to  be  performed  with  unknown  cases 
that  have  never  been  used  for  testing  the  CAD  system  before, 
such  as  those  in  a  prospective  clinical  trial.  However,  this 
test  step  is  beyond  the  scope  of  our  current  developmental 
process.  Since  we  used  the  same  cross-validation  method  for 
evaluation  of  the  dual  system  and  the  single  CAD  systems, 
the  comparison  of  their  relative  performances  is  expected  to 
be  less  biased  than  their  individual  performances. 


V.  CONCLUSION 

We  have  proposed  a  new  dual  system  approach  which 
combines  a  system  trained  with  subtle  masses  on  prior  mam¬ 
mograms  and  a  system  trained  with  average  masses  on  cur¬ 
rent  mammograms.  The  dual  system  achieved  higher  sensi¬ 
tivities  at  the  corresponding  FP  rates  than  a  single  CAD 
system  trained  with  average  masses  alone  or  trained  with 
both  average  masses  and  subtle  masses.  Alternatively,  the 
dual  system  had  lower  FP  rates  than  the  single  CAD  system 
at  corresponding  sensitivities.  The  improvement  in  the 
FROC  curves  by  the  dual  system  approach  was  found  to  be 
statistically  significant  (p  <  0.05)  for  both  average  masses 
and  subtle  masses  using  either  the  AFROC  or  the  JAFROC 
method.  Our  results  indicate  that  the  dual  system  approach  is 
promising  for  improving  the  performance  of  CAD  systems 
for  mass  detection  on  mammograms. 
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Rationale  and  Objectives.  To  compare  the  performance  of  computer  aided  detection  (CAD)  systems  on  pairs  of  full-field 
digital  mammogram  (FFDM  )  and  screen-film  mammogram  (SFM  )  obtained  from  the  same  patients, 

Materials  and  Methods.  Our  CAD  systems  on  both  modalities  have  similar  architectures  that  consist  of  five  steps, 
For  FFDM  s,  the  input  raw  image  is  first  log-transformed  and  enhanced  by  a  multiresolution  preprocessing  scheme, 
For  digitized  SFM  s,  the  input  image  is  smoothed  and  subsampled  to  a  pixel  size  of  100  /rm  x  100  pirn,  For  both 
CAD  systems,  the  mammogram  after  preprocessing  undergoes  a  gradient  field  analysis  followed  by  clustering-based 
region  growing  to  identify  suspicious  breast  structures.  Each  of  these  structures  is  refined  in  a  local  segmentation 
process,  M  orphologic  and  texture  features  are  then  extracted  from  each  detected  structure,  and  trained  rule-based  and 
linear  discriminant  analysis  classifiers  are  used  to  differentiate  masses  from  normal  tissues.  Two  datasets,  one  with 
masses  and  the  other  without  masses,  were  collected.  The  mass  dataset  contained  131  cases  with  131  biopsy  proven 
masses,  of  which  27  were  malignant  and  104  benign.  The  true  locations  of  the  masses  were  identified  by  an  experi¬ 
enced  M  ammography  Quality  Standards  Act  (M  QSA )  radiologist.  The  no-mass  data  set  contained  98  cases.  The  time 
interval  between  the  FFDM  and  the  corresponding  SFM  was  0  to  118  days. 

Results.  Our  CAD  system  achieved  case-based  sensitivities  of  70%,  80%,  and  90%  at  0.9,  1.5,  and  2.6  false  positive  (FP) 
marks/image,  respectively,  on  FFDM  s,  and  the  same  sensitivities  at  1.0,  1,4,  and  2.6  FP  marks/image,  respectively,  on 
SFM  s. 

Conclusions.  The  difference  in  the  performances  of  our  FFDM  and  SFM  CAD  systems  did  not  achieve  statistical  signifi¬ 
cance. 

Key  Words.  Computer-aided  detection;  mass  detection;  full-field  digital  mammogram  (FFDM  );  screen-film  mammogram 
(SFM  );  free-response  receiver  operating  characteristic  (FROC). 

°  AUR,  2007 


Acad  Radiol  2007;  14:659-669 

1  From  the  Department  of  Radiology,  University  of  Michigan,  CGC  B2103,  1500  E.  Medical  Center  Drive,  Ann  Arbor,  Ml  48109  (J.W.,  L.M.H.,  B.S.,  H.-P.C., 
J.G.,  M.A.R.,  M.A.H.,  C.Z.,  Y.-T.W.,  C.P.,  Y.Z.).  Received  December  15,  2006;  accepted  February  13,  2007.  This  work  is  supported  by  US  Army  Medical 
Research  and  Materiel  Command  grants  W81XWH-1 -04-1 -0475  and  DAMD  17-02-1-0214,  and  USPHS  grant  CA95153.  The  content  of  this  article 
does  not  necessarily  reflect  the  position  of  the  government  and  no  official  endorsement  of  any  equipment  and  product  of  any  companies  mentioned 
should  be  inferred.  The  authors  are  grateful  to  Charles  E.  Metz,  PhD,  for  the  LABROC  program  and  to  Dev  Chakraborty,  PhD,  for  the  JAFROC  pro¬ 
gram.  Address  correspondence  to:  J.W.  e-mail:  jvwei@umich.edu. 

©  AUR,  2007 

doi:1 0.1 01 6/j.acra.2007.02.01 7 


659 


WEI  ET  AL 


Academic  Radiology,  Vol  14,  No  6,  June  2007 


Full-field  digital  mammography  (FFDM)  and  screen- 
film  mammography  (SFM  )  are  two  available  methods 
for  breast  cancer  screening  in  clinical  practice.  FFDM 
detectors  provide  higher  detective  quantum  efficiency 
(DQE)  and  signal-to-noise  ratio  (SNR),  wider  dynamic 
range,  and  higher  contrast  sensitivity  than  SFM  .  FFDM 
may  alleviate  some  of  the  limitations  of  SFM  ,  espe¬ 
cially  in  breasts  with  dense  fi brogl andul ar  tissue  (1).  In 
the  last  few  years,  several  FFDM  systems  became  com¬ 
mercially  available  because  of  the  potential  of  digital 
imaging  to  improve  breast  cancer  detection. 

Several  clinical  trials  have  been  conducted  to  compare 
radiologists'  interpretation  on  FFDM  sand  SFMs.  Lewin 
et  al  (2,3)  conducted  a  clinical  study  to  compare  FFDM  s 
and  SFM  s  for  the  detection  of  breast  cancer  in  6,737  ex¬ 
aminations  of  women  40  years  of  age  and  older  collected 
from  two  institutions.  Forty-two  cancers  were  detected 
within  this  population.  The  difference  in  cancer  detection 
was  not  statistically  significant  (P  >  .1)  between  FFDM  s 
and  SFM  s.  FFDM  s  resulted  in  fewer  recalls  than  did 
SFM  ,  which  was  statistically  significant  (P  <  .001).  An¬ 
other  clinical  trial  (4)  aiming  at  collecting  data  for  US 
Food  and  Drug  Administration  approval  included  SFMs 
and  FFDM  s  of  676  women  who  were  scheduled  to  un¬ 
dergo  breast  biopsy.  The  average  area  under  the  receiver 
operating  characteristic  (ROC)  curve,  the  sensitivity  and 
the  specificity  were  0.715,  0.66  and  0.67  for  printed 
FFDM  and  0.765,  0.74,  0.60  for  SFM  ,  respectively.  How- 
ever,  none  of  these  differences  achieved  statistical  signifi¬ 
cance.  Skaane  et  al  (5-7)  has  conducted  several  clinical 
studies  to  compare  SFM  and  FFDM  with  soft-copy  inter¬ 
pretation  for  reader  performance  in  detection  and  classifi¬ 
cation  of  breast  lesions.  According  to  their  findings,  there 
was  no  significant  difference  between  FFDM  and  SFM 
either  in  detection  or  in  classification.  A  recent  study  by 
Pisano  et  al  (1)  collected  a  total  of  49,528  patients  at  33 
sites  in  the  United  States  and  Canada.  M  ammograms 
were  interpreted  independently  by  two  radiologists.  The 
overall  diagnostic  accuracy  of  FFDM  s  and  SFM  s  for 
breast  cancers  was  similar.  However,  FFDM  was  more 
accurate  in  women  younger  than  age  50  years,  women 
with  radiographically  dense  breasts,  and  premenopausal  or 
peri  menopausal  women. 

Studies  indicate  that  radiologists  do  not  detect  all  car¬ 
cinomas  that  are  visible  on  retrospective  analyses  of  the 
images  (8-14).  Computer-aided  diagnosis  (CAD)  is  con¬ 
sidered  to  be  one  of  the  promising  approaches  that  may 
improve  the  sensitivity  of  mammography  (15,16).  M  ost  of 
the  mammographic  CAD  systems  developed  so  far  are 


based  on  digitized  SFM  s.  Li  et  al  (17)  attempted  to  adapt 
their  CAD  system  developed  on  SFM  s  for  detection  of 
masses  on  FFDM  s  by  standardizing  the  FFDM  s.  Their 
preliminary  results  on  a  small  data  set  (training  on  36 
normal  and  24  mass  cases,  testing  on  24  normal  and  10 
mass  cases)  showed  60%  sensitivity  at  2.47  false  posi¬ 
tives  (FPs)/image.  Several  commercial  CAD  systems  re¬ 
ported  comparable  performance  on  FFDM  s  and  SFM  s. 
However,  their  study  was  not  reported  in  peer-reviewed 
journals,  so  that  the  dataset  and  algorithm  are  unknown. 
So  far,  there  are  no  studies  on  comparison  of  breast  mass 
detection  between  FFDMsand  SFMs  from  the  same  pa¬ 
tients  by  using  CAD  system.  We  have  developed  a  CAD 
system  for  mass  detection  on  SFM  s  (18,19)  and  are 
adapting  the  system  to  FFDM  s.  Our  preliminary  study 
with  65  patients  was  reported  previously  (20).  in  this 
study,  we  compared  the  performance  of  the  two  CAD 
systems  on  case-matched  pairs  of  FFDM  s  and  SFM  s. 


MATERIALS  AND  METHODS 


Materials 

Our  study  group  consisted  of  patients  with  breast  le¬ 
sions  that  were  categorized  suspicious  and  recommended 
for  biopsy.  The  patients  had  either  FFDM  or  SFM  for 
their  clinical  exams,  institutional  review  board  approval 
and  patient  informed  consent  were  obtained  to  acquire 
corresponding  mammograms  of  the  breast  to  be  biopsied 
using  the  other  modality.  Therefore,  the  corresponding 
FFDM  and  SFM  were  available  only  from  one  breast  for 
each  patient.  The  time  interval  between  the  SFM  and  the 
FFDM  ranged  from  0  to  118  days.  The  dataset  consisted 
of  229  patients  aged  30-86  with  a  mean  age  of  55  ±  11 
years.  All  cases  have  two  mammographic  views,  the 
craniocaudal  view  and  the  mediolateral  oblique  view  or 
the  lateral  view,  yielding  a  total  of  458  FFDM  s  and  458 
corresponding  SFM  s.  The  SFM  s  were  acquired  with 
M  i nR 2000  screen-film  systems  (Eastman  Kodak, 
Rochester,  NY)  and  digitized  with  a  LUM  ISCAN  85  laser 
film  scanner  (Lumisys,  Los  Altos,  CA)  at  a  pixel  resolu¬ 
tion  of  50  jam  x  50  pirn  and  4096  gray  levels.  The  digi¬ 
tizer  was  calibrated  so  that  gray-level  values  were  linearly 
proportional  to  the  optical  density  in  the  range  of  0-4, 
with  a  slope  of  0.001  per  pixel  value.  The  digitizer  output 
was  linearly  converted  so  that  a  large  pixel  value  corre¬ 
sponded  to  a  low  optical  density.  FFDM  s  were  acquired 
with  a  GE  Senographe  2000D  system  (GE  Medical  Sys¬ 
tems,  M  ilwaukee,  W I).  The  GE  system  has  a  Csl 
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Table  1 

Description  of  Cases  in  the  Mass  Datasets  and  Subsets  for  Training  and  Testing  in  the  Twofold  Cross-Validation  Scheme 


Mass  Set 

Mass  Subset  1 

Mass  Subset  2 

FFDM  SFM 

FFDM  SFM 

FFDM  SFM 

Total  number  of  cases 

131 

131 

65 

65 

66 

66 

Total  number  of  images 

262 

262 

130 

130 

132 

132 

Number  of  visible  masses  (by  case) 

131 

130 

65 

65 

66 

65 

Number  of  masses  only  visible  on  one  view 

8 

9 

5 

5 

3 

4 

Number  of  visible  masses  (by  image) 

254 

251 

125 

125 

129 

126 

Number  of  visible  malignant  masses 

27 

27 

12 

12 

15 

15 

Number  of  visible  benign  masses 

104 

103 

53 

53 

51 

50 

FFDM:  full-field  digital  mammogram;  SFM:  screen-film  mammogram. 


phosphor/a:Si  active  matrix  flat  panel  digital  detector  with 
a  pixel  size  of  100  ju,m  x  100  /u. m  and  14  bits  per  pixel. 
The  raw  FFDM  s  were  used  as  the  input  of  our  CAD  sys¬ 
tem. 

The  dataset  included  131  cases  containing  masses 
and  98  cases  containing  microcalcifications  without  a 
visible  mass,  as  determined  with  visual  inspection  by 
an  experienced  radiologist.  The  131  cases  will  be  re¬ 
ferred  to  as  the  mass  dataset  and  the  98  cases  as  the 
"no-mass"  data  set  in  the  following  discussion.  The 
no-mass  cases  were  considered  as  "normal"  with  re¬ 
spect  to  masses  and  were  used  to  estimate  the  FP  mark 
rates  of  the  CAD  systems  during  testing.  The  mass 
dataset  contained  131  biopsy  proved  masses,  of  which 
27  were  malignant  and  104  benign.  By  examining  all 
available  information,  including  the  diagnostic  mam¬ 
mograms  and  reports,  the  true  locations  of  the  masses 
were  identified  by  an  experienced  Mammography  Qual¬ 
ity  Standards  Act  (M  QSA)  radiologist.  In  these  131 
mass  cases,  1  mass  can  be  seen  only  on  FFDM  s,  7 
masses  can  be  seen  on  only  one  view  on  both  FFDM  s 
and  SFM  s,  and  3  masses  can  be  seen  on  only  one  view 
on  either  FFDM  s  (1  mass)  or  SFM  s  (2  masses).  There 
were  therefore  131  visible  masses  on  FFDM  s  and  130 
visible  masses  on  SFM  s  if  the  masses  were  counted  by 
case.  There  were  254  visible  and  8  invisible  masses  on 
FFDM  s  and  251  visible  and  11  invisible  masses  on 
SFMs  if  the  masses  were  counted  independently  by 
mammographic  view.  The  number  of  images  and 
masses  in  the  mass  dataset  are  described  in  Table  1. 
Figure  1  shows  an  example  with  a  7-mm  malignant 
mass.  The  size  of  a  mass  was  estimated  as  its  longest 
diameter  seen  on  the  mammograms.  The  visibility  of  the 
masses  was  rated  by  the  experienced  radiologist  on  a  10- 
point  scale,  with  1  representing  the  most  visible  masses  and 


a.  b. 


c.  d. 

Figure  1.  An  example  of  mammograms  with  a  region  of  interest 
(ROI)  containing  a  malignant  mass  with  a  size  of  7  mm.  (a)  Pro¬ 
cessed  full-field  digital  mammogram  (FFDM)  by  using  the  Lapla- 
cian  pyramid  multiscale  method,  (b)  digitized  screen-film  mam¬ 
mogram  (SFM),  (c)  magnified  ROI  on  FFDM,  and  (d)  magnified 
ROI  on  SFM.  The  SFM  is  displayed  with  the  same  resolution  as 
that  of  the  FFDM.  The  apparently  smaller  breast  size  on  SFM  is 
mainly  caused  by  the  very  dark  breast  periphery  region  on  the 
SFM  that  cannot  be  seen  on  the  printed  page. 

10  the  most  difficult  case  relative  to  the  cases  seen  in  clini¬ 
cal  practice.  Figures  2  and  3  show  the  histograms  of  mass 
sizes  and  visibility,  respectively,  for  the  mass  set.  The  mass 
size  ranged  from  3  to  30  mm  (mean:  12.5  ±  4.9  mm  on 
FFDM  s  and  12.6  ±  4.9  mm  on  SFM  s)  and  the  visibility 
ratings  extended  over  the  entire  range.  Figure  4  shows  the 
breast  density  in  terms  of  B I -RADS  category  as  estimated 
by  the  radiologist  for  the  FFDM  and  SFM  datasets. 
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Figure  2.  Histogram  of  the  sizes  for  254  masses  on  full-field 
digital  mammograms  (FFDMs)  and  251  masses  on  the  screen-film 
mammograms  (SFMs)  in  our  dataset.  Mass  sizes  are  measured  as 
the  longest  dimension  of  the  mass  by  an  experienced  Mammog¬ 
raphy  Quality  Standards  Act  (MQSA)  radiologist.  The  size  of  the 
masses  in  the  dataset  ranged  from  3  to  30  mm  (mean:  12.5  ±  4.9 
mm  on  FFDMs  and  12.6  ±  4.9  mm  on  SFMs). 


12  3  4 

Breast  Density 


Figure  4.  Distribution  of  the  breast  density  for  the  229  cases  in 
terms  of  BI-RADS  category  estimated  by  an  MQSA  radiologist. 


Mass  Visibility 

Figure  3.  Histogram  of  the  visibility  of  the  254  masses  seen  on 
full-field  digital  mammograms  and  251  masses  seen  on  screen- 
film  mammograms  in  our  dataset.  The  visibility  is  evaluated  on  a 
10-point  rating  scale,  with  1  representing  the  most  visible  masses 
and  10  the  most  difficult  case  relative  the  cases  seen  in  their  clin¬ 
ical  practice.  Each  mass  on  a  mammogram  is  rated  indepen¬ 
dently  by  an  experienced  MQSA  radiologist. 


METHODS 


CAD  System 

The  major  steps  in  the  mass  detection  systems  on 
FFDM  s  and  SFM  s  are  similar,  but  the  feature  spaces  and 
classifiers  for  FP  reduction  in  each  system  were  designed 
separately  to  suit  the  characteristics  of  FFDMs  and  SFM  s. 
The  two  systems  are  therefore  described  together,  but  the 
differences  will  be  pointed  out  whenever  applicable.  Each 
single  CAD  system  consists  of  five  processing  steps: 


1)  preprocessing,  2)  prescreening  of  mass  candidates,  3) 
segmentation  of  suspicious  objects,  4)  feature  extraction 
and  analysis,  and  5)  FP  reduction  by  classification  of  nor¬ 
mal  tissue  structures  and  masses. 

FFDMs  are  generally  preprocessed  with  proprietary 
methods  by  the  manufacturer  of  the  FFDM  system  before 
being  displayed  to  readers.  The  image  preprocessing 
method  used  depends  on  the  manufacturer  of  the  FFDM 
system.  To  develop  a  CAD  system  that  is  less  dependent 
on  the  FFDM  manufacturer's  proprietary  preprocessing 
methods,  we  use  the  raw  FFDM  as  input  to  our  CAD 
system.  We  have  previously  developed  a  multiscale  pre¬ 
processing  scheme  for  image  enhancement  (21).  In  brief, 
the  raw  mammogram  is  first  segmented  automatically  into 
the  background  and  the  breast  region.  A  logarithmic 
transform  is  applied  to  the  image  which  is  then  scaled  to 
12-bit.  The  Laplacian  pyramid  method  (21,22)  is  used  to 
decompose  the  transformed  breast  image  into  multiscales. 
A  nonlinear  weight  function  based  on  the  pixel  gray  level 
from  each  of  the  low-pass  components  is  designed  to  en¬ 
hance  the  high-pass  components.  The  processed  image  is 
reconstructed  by  summing  the  weighted  components. 

For  SFM  s,  the  full  resolution  digitized  mammograms 
are  smoothed  with  a  2  x  2  box  filter  and  subsampled  by 
a  factor  of  2,  resulting  in  images  having  a  pixel  size  of 
100  x  100  /im.  These  images  are  used  as  input  to 
the  CAD  system. 

After  preprocessing,  a  two-stage  gradient  field  analysis 
method  (21,23)  is  used  to  identify  the  mass  candidates  for 
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either  FFDM  s  or  SFM  s.  in  brief,  a  gradient  field  analysis 
is  employed  in  the  first  stage  to  identify  potential  mass 
candidates  based  on  high  values  of  the  initial  gradient 
field.  Each  potential  mass  candidate  is  segmented  by  a 
region  growing  technique.  The  shape  and  the  gray-level 
information  of  the  segmented  object  allow  adaptive  re¬ 
finement  of  the  gradient  field  analysis  in  the  second  stage. 
Locations  of  high  radial  gradient  convergence  are  then 
labeled  as  mass  candidates.  These  suspicious  objects  are 
segmented  with  a  k-means  clustering  method  (24).  First,  a 
256  x  256  pixel  region  of  interest  (R 0 1 )  centered  at  the 
high  gradient  point  is  background-corrected  (25)  and 
weighted  by  a  Gaussian  function  with  a  =  256  pixels. 
K-means  clustering  using  the  pixel  values  in  a  back¬ 
ground-corrected  image  and  a  Sobel  filtered  image  as  fea¬ 
tures  is  then  used  to  segment  the  object. 

For  each  suspicious  object,  eleven  morphological  fea¬ 
tures  (18)  are  extracted.  A  rule-based  classifier  removes 
the  detected  structures  that  are  substantially  different  from 
breast  masses.  Global  and  local  multiresolution  texture 
analyses  (26)  are  performed  in  each  ROI  by  using  the 
spatial  gray-level  dependence  (SGLD)  matrices.  Thirteen 
SGLD  texture  measures  are  used.  Global  texture  features 
are  extracted  from  the  entire  ROI  for  two  scales,  seven 
distances,  and  two  angles.  Local  texture  features  are  ex¬ 
tracted  from  the  local  region  containing  the  detected  ob¬ 
ject  and  the  peripheral  regions  within  each  ROI  for  two 
scales,  four  distances,  and  two  angles.  Therefore,  a  total 
of  364  features  and  208  features,  respectively,  are  ex¬ 
tracted  from  global  and  local  texture  analysis.  The  feature 
space  for  final  classification  is  the  combination  of  mor¬ 
phologic  features  and  SGLD  texture  features.  Finally,  lin¬ 
ear  discriminant  analysis  (LDA)  is  used  to  classify  masses 
from  normal  tissue  in  the  feature  space.  The  discriminant 
scores  are  ranked  for  each  mammogram,  and  any  object 
with  a  discriminant  score  that  ranks  lower  than  three  is 
eliminated. 

Training  and  Test  CAD  System 

Twofold  cross-validation  was  used  for  training  and 
testing  our  CAD  system  for  FFDM  s.  We  randomly  sepa¬ 
rated  the  mass  datasets  by  case  into  two  independent  sub¬ 
sets:  subset  1  with  65  cases  and  subset  2  with  66  cases. 
The  numbers  of  masses  by  image  and  by  case  for  the 
FFDM  and  SFM  subsets  are  shown  in  Table  1.  The  train¬ 
ing  included  selection  of  proper  parameters  and  features 
for  the  classifier  in  the  CAD  system.  After  the  training 
with  one  mass  subset  was  completed,  the  parameters  and 
features  were  fixed  for  testing  with  the  other  mass  subset. 


The  training  and  test  mass  subsets  were  switched  and  the 
training  and  test  processes  were  repeated.  The  trained 
CAD  systems  were  also  applied  to  the  no- mass  data  set, 
which  was  not  used  during  training,  to  estimate  the  FP 
rate  in  screening  mammograms. 

During  training,  feature  selection  with  stepwise  LDA 
was  applied  to  obtain  the  best  feature  subset  and  reduce 
the  dimensionality  of  the  feature  space  to  design  an  effec¬ 
tive  classifier.  The  detailed  procedure  has  been  described 
elsewhere  (21,27,28).  Briefly,  at  each  step  one  feature 
was  entered  or  removed  from  the  feature  pool  by  analyz¬ 
ing  its  effect  on  the  selection  criterion,  which  was  chosen 
to  be  the  Wilks'  lambda  in  this  study.  Because  the  appro¬ 
priate  threshold  values  for  feature  entry,  feature  elimina¬ 
tion,  and  tolerance  of  feature  correlation  were  unknown, 
we  used  an  automated  simplex  optimization  method  to 
search  for  the  best  combination  of  thresholds  in  the  pa¬ 
rameter  space.  The  simplex  algorithm  used  a  leave-one- 
case-out  resampling  method  within  the  training  subset  to 
select  features  and  estimate  the  weights  for  the  LDA  clas¬ 
sifier.  T o  have  a  figure  of  merit  to  guide  feature  selection, 
the  test  discriminant  scores  from  the  left-out  cases  were 
analyzed  using  ROC  methodology  (29).  The  accuracy  for 
classification  of  masses  and  FPs  was  evaluated  as  the  area 
under  the  ROC  curve,  Az,  for  the  test  cases.  In  this  ap¬ 
proach,  feature  selection  was  performed  without  the  left- 
out  case  so  that  the  test  performance  would  be  less  opti¬ 
mistically  biased  (30).  Flowever,  the  selected  feature  set 
in  each  leave-one-case-out  cycle  could  be  slightly  differ¬ 
ent  because  every  cycle  had  one  training  case  different 
from  the  other  cycles.  T o  obtain  a  single  trained  classifier 
to  apply  to  the  cross-validation  test  subset,  a  final  step¬ 
wise  feature  selection  was  performed  with  the  best  combi¬ 
nation  of  thresholds,  found  in  the  simplex  optimization 
procedure,  on  the  entire  training  subset  to  obtain  the  final 
set  of  features  and  estimate  the  weights  of  the  LDA.  Note 
that  the  entire  process  of  feature  selection  and  classifier 
weight  estimation  was  performed  within  the  training  sub¬ 
set.  The  LDA  classifier  with  the  selected  feature  set  was 
then  fixed  and  applied  to  the  cross-validation  test  subset. 
The  training  and  testing  processes  were  performed  inde¬ 
pendently  for  the  twofold  cross-validation  sets. 

Because  we  already  trained  our  CAD  system  for  SFM  s 
with  a  large  dataset  in  a  previous  study  (19),  we  used  the 
trained  system  without  retraining  the  parameters  in  this 
study.  For  testing,  we  divided  the  SFM  s  into  two  test 
datasets  that  followed  the  same  case  grouping  as  that  for 
FFDM  s.  The  test  cases  in  each  subset  did  not  overlap 
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with  any  training  cases  used  for  training  the  SFM  CAD 
system  in  the  previous  study. 

Evaluation  Methods 

We  used  a  free-response  ROC  (FROC)  method  (31)  to 
assess  the  overall  performance  of  the  CAD  scheme  on 
this  image  set.  A  FROC  curve  is  obtained  by  plotting  the 
mass  detection  sensitivity  as  a  function  of  FP  marks  per 
image  as  the  decision  threshold  on  the  LDA  classifier 
scores  varies. 

The  detected  individual  objects  were  compared  with 
the  "true"  mass  locations  marked  by  the  experienced  radi¬ 
ologist,  as  described  previously.  A  detected  object  was 
labeled  as  true  positive  (TP)  if  the  overlap  between  the 
bounding  box  of  the  detected  object  and  the  bounding 
box  of  the  true  mass  relative  to  the  larger  of  the  two 
bounding  boxes  was  over  25%.  Otherwise,  it  would  be 
labeled  as  FP.  The  25%  threshold  was  selected  as  de¬ 
scribed  in  our  previous  study  (18). 

FROC  curves  were  presented  on  a  per-image  and  a 
per-case  basis.  For  image-based  FROC  analysis,  the  mass 
on  each  mammogram  was  considered  an  independent  true 
object;  the  sensitivity  was  thus  calculated  relative  to  the 
number  of  masses  by  image  on  each  subset  of  FFDM  s  or 
SFM  s  (Table  1).  For  case-based  FROC  analysis,  the  same 
mass  imaged  on  the  two-view  mammograms  was  consid¬ 
ered  to  be  one  true  object  and  detection  of  either  or  both 
masses  on  the  two  views  was  considered  to  be  a  TP  de¬ 
tection;  the  sensitivity  was  thus  calculated  relative  to  the 
number  of  masses  by  case  on  each  subset  of  FFDM  s  or 
SFM  s  (Table  1).  The  test  FROC  curve  for  a  given  mass 
subset  was  estimated  by  counting  the  detected  masses  on 
the  test  mass  subset  for  the  sensitivity.  The  FP  marker 
rate  was  estimated  in  two  ways:  one  from  FPs  detected  in 
the  same  test  mass  subsets,  the  other  from  FPs  detected  in 
the  no-mass  dataset.  For  the  latter,  we  applied  the  trained 
CAD  system  to  the  entire  no-mass  dataset.  The  average 
number  of  FP  marks  per  image  produced  by  the  CAD 
system  at  a  given  sensitivity  was  estimated  by  counting 
the  detected  objects  in  these  cases  at  the  corresponding 
decision  threshold.  Because  we  used  twofold  cross-valida¬ 
tion  method  for  training  and  testing,  we  obtained  two  test 
FROC  curves,  one  for  each  test  subset,  for  each  of  the 
modalities.  T o  summarize  the  results  for  comparison,  an 
average  test  FROC  curve  was  derived  by  averaging  the 
FP  rates  at  the  same  sensitivity  along  the  FROC  curves  of 
the  two  corresponding  test  subsets. 

To  compare  the  performance  of  our  CAD  system  for 
FFDMsand  SFMs  statistically,  we  applied  the  alternative 


Number  of  False  Positives  per  Image 

Figure  5.  Comparison  of  free-response  receiver  operating  char¬ 
acteristic  (FROC)  curves  on  full-field  digital  mammograms  and 
screen-film  mammograms  during  the  prescreening  stage.  The 
FROC  curves  were  generated  by  varying  the  number  of  detected 
suspicious  objects  per  image  based  on  the  ranking  of  the  local 
maxima  on  gradient  field  images.  The  FP  rate  was  estimated  from 
the  mammograms  with  masses. 


free-response  ROC  (AFROC)  method  and  the  jackknife 
free-response  ROC  (J  A  FROC)  method  developed  by 
Chakraborty  et  al  (32,33)  to  the  pairs  of  FROC  curves,  in 
the  AFROC  method,  the  FROC  data  are  first  transformed 
by  counting  the  number  of  false-positive  images  instead 
of  the  FPs  per  image.  The  LDA  score  of  a  false-positive 
image  is  determined  by  the  highest  score  FP  object  on  the 
image  regardless  of  how  many  lower  scores  FP  objects 
are  made  on  the  same  image.  The  ROCKIT  curve  fitting 
software  and  statistical  significance  tests  for  ROC  analysis 
developed  by  M  etz  et  al  (29)  can  then  be  used  to  analyze 
the  AFROC  data. 


RESULTS 


For  simplicity,  we  combined  the  detection  results  on 
the  two  test  subsets  from  the  twofold  cross-validation  pro¬ 
cess  in  the  following  discussion.  The  prescreening  stage 
detected  91.3%  (232/254)  of  the  masses  with  an  average 
of  10.13  (2,655/262)  FPs  /image  on  FFDM  s  and  93.2% 
(234/251)  with  an  average  of  14.43  (3,781/262)  FPs/im- 
age  on  SFM  s.  Figure  5  compares  the  FROC  curves  on 
FFDM  s  and  SFM  s  during  the  prescreening  stage.  The 
FROC  curves  were  generated  by  varying  the  number  of 
detected  suspicious  objects  per  image  based  on  the  rank¬ 
ing  of  local  maxima  on  the  gradient  field  images. 
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We  used  two  steps  for  FP  reduction  for  both  CAD 
systems.  The  first  step  was  the  rule-based  classification 
based  on  morphologic  features.  After  this  step,  there  were 
2,572  mass  candidates  (9.8  objects/image)  on  FFDM  s  and 
3,654  mass  candidates  (13.9  objects/image)  on  SFM  s 
without  additional  FNs  for  the  test  sets  of  262  images. 

The  second  step  was  the  LDA  classification.  A  total  of  16 
(4  global  texture  features,  7  local  texture  features,  and  5 
morphologic  features)  and  12  (4  global  texture  features,  4 
local  texture  features,  and  4  morphologic  features)  fea¬ 
tures,  respectively,  were  selected  from  the  two  indepen¬ 
dent  training  subsets  for  FFDM  s.  The  feature  set  for 
SFM  s  contained  a  total  of  21  features  (11  global  texture 
features,  7  local  texture  features,  and  3  morphologic  fea¬ 
tures),  as  obtained  from  previous  training. 

Figure  6  shows  the  comparison  of  the  average  test 
FROC  curves  of  the  CAD  systems  for  FFDM  s  and  SFM  s. 
The  FFDM  CAD  system  achieved  a  case-based  sensitivity 
of  70%,  80%,  and  90%  at  0.67,  1.15,  and  1.93  FPs/image, 
respectively,  compared  with  0.75,  1.06,  and  1.86  FPs / 
image  for  the  SFM  CAD  system.  Because  two  trained 
CAD  systems  were  obtained  for  the  FFDM  s  from  the 
cross-validation  training,  we  applied  each  of  the  trained 
systems  to  the  no-mass  data  set  for  FROC  analysis,  and 
estimated  the  number  of  FP  marks  per  image  on  the  no¬ 
mass  cases  at  each  decision  threshold.  For  each  trained 
CAD  system,  the  sensitivity  was  estimated  from  the  de¬ 
tected  masses  on  the  test  mass  subset  and  plotted  against 
the  FP  rate  estimated  from  the  no-mass  set.  Figure  7 
shows  the  average  FROC  curves  for  FFDM  s  and  SFM  s, 
similar  to  those  shown  in  Fig  6,  except  that  the  FP  rates 
were  estimated  from  the  no-mass  data  set. 

The  comparison  of  the  FROC  curves  for  the  FFDM 
and  SFM  CAD  systems  in  terms  of  the  area  under  the 
fitted  A  FROC  curve  (A2)  and  the  P  values  for  both  test 
mass  subsets  are  summarized  in  Table  2.  The  differences 
in  the  Aj  values  between  the  two  modalities  did  not 
achieve  statistical  significance  (P  >  .05).  The  fitted 
A  FROC  curves,  however,  did  not  fit  very  well  to  the 
transformed  A  FROC  data,  as  discussed  previously  (21). 
FortheJAFROC  method,  Chakraborty  et  al  provided 
software  to  estimate  the  statistical  significance  of  the  dif¬ 
ference  between  two  FROC  curves.  The  comparison  of 
the  figure-of-merit  (FOM  )  and  the  P  values  is  also  sum¬ 
marized  in  Table  2.  The  differences  in  the  FOM  s  between 
the  FFDM  and  SFM  CAD  systems  again  did  not  achieve 
statistical  significance  (P  >  .05). 

There  were  27  malignant  cases  in  the  mass  set. 

Figure  8  compares  the  average  test  FROC  curves  of  the 
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Figure  6.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  obtained  from  averaging 
the  FROC  curves  of  the  two  independent  mass  subsets  on  full- 
field  digital  mammograms  and  screen-film  mammograms.  The  FP 
rate  was  estimated  from  the  mammograms  with  masses,  (a)  Im¬ 
age-based  FROC  curves  and  (b)  case-based  FROC  curves. 


FFDM  and  SFM  CAD  systems  for  detection  of  malignant 
masses.  The  FP  rate  was  estimated  from  the  no-mass 
dataset.  In  this  case,  the  FFDM  CAD  system  achieved  a 
case-based  sensitivity  of  70%,  80%,  and  90%  at  0.37, 
0.73,  and  1.31  FP  marks/image,  respectively,  which  were 
substantially  better  than  the  FP  rates  of  1.1,  1.6,  and  2.0 
FP  marks/image  for  the  SFM  CAD  system.  However,  the 
difference  did  not  achieve  statistical  significance 
(P  >  .05). 

A  total  of  105  FFDM  cases  and  134  SFM  cases  were 
identified  as  BI-RADS  3  and  4  categories  by  an  MQSA 
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Figure  7.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  obtained  from  averaging 
the  FROC  curves  of  the  two  independent  mass  subsets  on  full- 
field  digital  mammograms  and  screen-film  mammograms.  The 
FP  rate  was  estimated  from  the  mammograms  without  masses, 
(a)  Image-based  FROC  curves  and  (b)  case-based  FROC 
curves. 


radiologist  (Fig  4).  Of  these,  88  cases  (56  mass  cases  and 
32  no-mass  cases)  were  in  common.  Figure  9  compares 
the  average  test  FROC  curves  of  the  FF DM  and  SFM 
CAD  systems  for  detection  of  masses  only  on  this  com¬ 
mon  subset  of  dense  breasts.  The  FP  rate  was  estimated 
from  the  32  no-mass  dense  breasts.  Although  the  FROC 
curve  for  the  FFDM  s  appears  to  be  slightly  higher  than 
that  of  the  SFM  s,  the  difference  did  not  achieve  statistical 
significance  ( P  >  .05). 


DISCUSSION 


CAD  systems  have  been  proven  to  be  helpful  as  a  sec¬ 
ond  opinion  to  assist  radiologists  in  interpretation  of 
SFM  s.  Recently  several  studies  have  been  conducted  to 
compare  FFDM  with  SFM  in  screening  cohorts  (1,4,5,34). 
These  clinical  trials  arrived  at  different  conclusions  about 
the  advantages  or  disadvantages  of  FFDM  in  comparison 
to  conventional  SFM  systems.  Some  of  the  differences 
may  be  attributed  to  factors  such  as  the  mammographic 
equipment,  the  study  design,  the  sample  sizes,  and  the 
reader  experience.  It  is  also  important  to  compare  the  per¬ 
formances  of  FFDM  and  SFM  CAD  systems.  In  our 
study,  we  compared  the  performance  of  the  two  systems 
on  pairs  of  FFDM  and  SFM  obtained  from  the  same  pa¬ 
tients  at  close  time  intervals. 

Several  FFDM  systems  have  been  approved  for  clini¬ 
cal  applications.  Because  digital  detectors  generally  have 
a  linear  response  to  x-ray  exposure,  the  raw  pixel  values 
are  a  linear  function  of  the  absorbed  x-ray  energy  in  the 
detector.  To  develop  a  CAD  system  that  is  less  dependent 
on  the  FFDM  manufacturer’s  proprietary  preprocessing 
methods,  we  used  the  raw  FFDM  as  input  to  our  CAD 
system.  Although  the  spatial  resolution  and  noise  proper¬ 
ties  of  the  images  from  different  detectors  were  still  dif¬ 
ferent,  the  use  of  raw  images  already  reduced  one  of  the 
major  differences  between  mammograms  from  different 
FFDM  systems.  For  preprocessing  of  the  raw  FFDM  s,  we 
developed  a  multiresolution  enhancement  method.  From 
our  observation  on  the  SFM  s  and  the  processed  FFDM  s, 
the  breast  tissue  on  SFM  s  appears  to  be  denser  than  that 
on  FFDM  s  (35).  This  may  be  attributed  to  the  harder 
beam  quality  used  and  the  Laplacian  enhancement  on 
FFDMs.  In  this  study,  134  SFM  cases  were  rated  as 
BI-RADS  3  and  4  categories  by  an  MQSA  radiologist, 
whereas  only  105  FFDM  cases  were  rated  as  BI-RADS  3 
and  4.  When  the  FFDM  and  SFM  CAD  systems  were 
applied  to  the  small  common  subset  (56  with  masses  and 
32  without  masses)  of  dense  breasts  rated  as  BI-RADS  3 
and  4,  there  was  no  significant  difference  between  their 
average  test  FROC  curves  (Fig  9). 

The  overall  performances  of  the  CAD  systems  for  the 
two  modalities  did  not  demonstrate  significant  difference 
for  comparisons  in  either  the  subsets  or  the  entire  dataset. 
One  factor  may  be  the  substantially  smaller  number  of 
training  samples  used  for  the  FFDM  CAD  system  than 
that  for  the  SFM  CAD  system,  which  was  trained  with  a 
set  of  486  SFMs  in  a  previous  study  (19).  We  have 
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Table  2 

Estimation  of  the  Statistical  Significance  of  the  Difference  in  the  FROC  Performances  Between  the  FFDM  and  SFM  CAD  Systems 


A1  (AFROC) 

FOM  (JAFROC) 

All  Cases 

Malignant  Cases 

All  Cases 

Malignant  Cases 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

Test  Subset  1 

Test  Subset  2 

FFDM 

0.48 

0.49 

0.51 

0.49 

0.47 

0.48 

0.55 

0.47 

SFM 

0.42 

0.43 

0.47 

0.42 

0.46 

0.41 

0.48 

0.42 

P  values 

.17 

.16 

.56 

.23 

.73 

.33 

.29 

.59 

The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  no-mass  dataset  were  compared. 

FROC,  ;  FFDM,  full-field  digital  mammogram;  SFM,  screen-film  mammogram;  CAD,  computed-aided  detection;  AFROC,  alternative 
free-response  receiver  operating  characteristic;  FOM,  figure-of-merit;  JAFROC,  jackknife  free-response  ROC. 


Number  of  False  Positives  per  Image 

a. 


Number  of  False  Positives  per  Image 

b. 

Figure  8.  Comparison  of  the  average  test  free-response  receiver 
operating  characteristic  (FROC)  curves  of  computed-aided  detec¬ 
tion  systems  on  full-field  digital  mammograms  and  screen-film 
mammograms  for  mammograms  with  malignant  masses.  The  FP 
rate  was  estimated  from  the  mammograms  without  masses,  (a) 
Image-based  FROC  curves  and  (b)  case-based  FROC  curves. 


shown  previously  that  a  classifier  designed  with  a  larger 
number  of  training  samples  will  have  better  generalization 
to  unknown  test  cases  (36).  Furthermore,  because  our 
CAD  system  was  originally  developed  on  SFMs,  some  of 
those  techniques  used  may  favor  SFM  s.  If  new  techniques 
are  designed  to  specifically  suit  the  properties  of  FFDM  s, 
the  biases  may  be  reduced.  Further  investigations  are  un¬ 
derway  to  improve  the  FFDM  CAD  system. 

We  used  a  twofold  cross-validation  method  for  training 
and  testing  of  the  CAD  systems.  Feature  selection  and 
classifier  weight  design  were  performed  within  the  train¬ 
ing  subset  and  thus  were  independent  of  the  test  subset. 
Kupinski  et  al  (37)  showed  that  feature  selection  and  clas¬ 
sifier  weight  design  using  the  same  training  set  of  a  lim¬ 
ited  size  will  introduce  additional  optimistic  bias  to  the 
training  result  and  thus  additional  pessimistic  bias  to  the 
test  result.  Under  the  constraint  of  a  limited  training  set, 
the  relative  gain  or  loss  in  terms  of  bias  if  the  training  set 
is  further  split  into  two  subsets  for  separate  feature  selec¬ 
tion  and  classifier  weight  design  in  comparison  to  using 
the  entire  set  of  available  training  samples  for  both  pro¬ 
cesses  is  still  unknown.  The  relative  efficiency  of  differ¬ 
ent  resampling  techniques  in  utilization  of  a  limited  data¬ 
set  for  classifier  design  with  or  without  feature  selection 
remains  an  important  area  of  further  studies.  In  screening 
mammography,  the  cancer  rate  is  about  3-5  per  1,000. 

M  ost  of  the  mammograms  are  normal.  Therefore,  some 
CAD  researchers  and  users  estimate  the  FP  rate  using 
normal  mammograms  (38-40)  because  it  reflects  how  the 
CAD  system  performs  in  terms  of  specificity  in  a  screen¬ 
ing  setting.  Furthermore,  for  CAD  systems  that  set  a  max¬ 
imum  number  of  detected  objects  at  the  output,  estimating 
the  number  of  FPs  using  images  with  lesions  can  poten¬ 
tially  lead  to  an  optimistic  bias  for  the  FROC  curve  be¬ 
cause  one  of  the  detected  objects  will  likely  be  the  true 


667 


WEI  ET  AL 


Academic  Radiology,  Vol  14,  No  6,  June  2007 


Number  of  False  Positives  per  Image 


a. 


Number  of  False  Positives  per  Image 

b. 

Figure  9.  Comparison  of  the  average  test  tree-response  receiver 
operating  characteristic  (FROC)  curves  of  computed-aided  detec¬ 
tion  systems  on  full-field  digital  mammograms  and  screen-film 
mammograms  for  the  common  subset  of  56  dense  breasts  with 
masses  rated  as  BI-RADS  3  and  4.  The  FP  rate  was  estimated 
from  32  no-mass  dense  breasts  that  were  also  rated  as  BI-RADS 
3  and  4.  (a)  Image-based  FROC  curves  and  (b)  case-based 
FROC  curves. 


lesion.  The  FP  rate  can  thus  be  underestimated  by  as 
much  as  1  per  image.  In  addition,  the  J  A  FROC  analysis 
requires  that  the  FP  rates  be  estimated  on  normal  images. 
We  therefore  reported  the  FP  rates  of  our  CAD  systems 
on  both  mammograms  with  masses  and  without  masses  to 
facilitate  comparison  with  other  CAD  systems  in  case 
investigators  may  evaluate  their  FP  rates  in  either  way. 

Although  we  collected  case-matched  cases  for  compar¬ 
ing  the  performances  of  the  CAD  systems  for  FFDM  s  and 
SFM  s,  the  images  may  not  be  exactly  matched.  Variations 


from  positioning,  compression  force,  and  the  difference  in 
time  between  the  two  acquisitions  would  cause  differ¬ 
ences  in  the  subtlety  of  the  masses  on  the  FFDM  s  and 
SFM  s.  However,  assuming  that  the  differences  are  ran¬ 
dom,  both  datasets  would  include  images  that  have  better 
or  worse  positioning,  for  example,  than  that  on  the  other 
modality.  The  differences  in  the  various  factors  would 
likely  be  averaged  out  over  the  entire  dataset.  W e  expect 
that  they  might  not  cause  substantial  bias  in  the  compari¬ 
son  of  the  relative  performances  of  the  CAD  systems  for 
the  two  modalities. 

For  a  CAD  system,  its  performance  for  detecting  ma¬ 
lignant  masses  is  more  important  than  its  performance  for 
detecting  all  masses.  We  only  have  27  malignant  cases  in 
this  dataset.  Although  the  FROC  curves  for  detection  of 
malignant  masses  (Fig  8)  indicated  that  the  FFDM  CAD 
system  had  a  higher  sensitivity  than  that  of  the  SFM 
CAD  system,  the  differences  in  theA2  and  the  FOM  did 
not  achieve  statistical  significance  (P  >  .05)  for  either 
test  subsets,  as  shown  in  Table  2.  A  large  dataset  is  being 
collected  for  further  comparison  of  the  FFDM  and  SFM 
CAD  systems  for  breast  cancer  cases. 


Conclusion 

We  compared  the  performance  of  our  CAD  systems 
for  detection  of  breast  masses  on  case-matched  FFDM 
images  and  SFM  images.  The  two  CAD  systems  used 
similar  computer  vision  techniques  but  their  preprocessing 
methods  were  different  and  the  FP  classifiers  were  sepa¬ 
rately  trained  to  adapt  to  the  image  properties  of  each 
modality.  From  the  comparison  of  FROC  curves,  it  was 
found  that  the  FFDM  CAD  system  achieved  higher  detec¬ 
tion  sensitivity  than  the  SFM  CAD  system  at  the  same  FP 
rates  for  malignant  cases.  However,  the  performances  of 
our  FFDM  and  SFM  CAD  systems  for  the  entire  data  set 
were  similar.  The  differences  between  the  two  modalities 
were  not  statistically  significant  with  both  A  FROC  and 
J  A  FROC  methods  for  either  the  entire  dataset  or  the  ma¬ 
lignant  cases  alone.  Further  study  is  under  way  to  collect 
a  larger  dataset  and  to  improve  the  performances  of  both 
systems. 
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We  have  developed  a  false  positive  (FP)  reduction  method  based  on  analysis  of  bilateral  mammo¬ 
grams  for  computerized  mass  detection  systems.  The  mass  candidates  on  each  view  were  first 
detected  by  our  unilateral  computer-aided  detection  (CAD)  system.  For  each  detected  object,  a 
regional  registration  technique  was  used  to  define  a  region  of  interest  (ROI)  that  is  “symmetrical” 
to  the  object  location  on  the  contralateral  mammogram.  Texture  features  derived  from  the  spatial 
gray  level  dependence  matrices  and  morphological  features  were  extracted  from  the  ROI  containing 
the  detected  object  on  a  mammogram  and  its  corresponding  ROI  on  the  contralateral  mammogram. 
Bilateral  features  were  then  generated  from  corresponding  pairs  of  unilateral  features  for  each 
object.  Two  linear  discriminant  analysis  (LDA)  classifiers  were  trained  from  the  unilateral  and  the 
bilateral  feature  spaces,  respectively.  Finally,  the  scores  from  the  unilateral  LDA  classifier  and  the 
bilateral  LDA  asymmetry  classifier  were  fused  with  a  third  LDA  whose  output  score  was  used  to 
distinguish  true  mass  from  FPs.  A  data  set  of  341  cases  of  bilateral  two-view  mammograms  was 
used  in  this  study,  of  which  276  cases  with  552  bilateral  pairs  contained  110  malignant  and  166 
benign  biopsy-proven  masses  and  65  cases  with  130  bilateral  pairs  were  normal.  The  mass  data  set 
was  divided  into  two  subsets  for  twofold  cross-validation  training  and  testing.  The  normal  data  set 
was  used  for  estimation  of  FP  rates.  It  was  found  that  our  bilateral  CAD  system  achieved  a 
case-based  sensitivity  of  70%,  80%,  and  85%  at  average  FP  rates  of  0.35,  0.75,  and  0.95  FPs/image, 
respectively,  on  the  test  data  sets  with  malignant  masses.  In  comparison  to  the  average  FP  rates  for 
the  unilateral  CAD  system  of  0.58,  1.33,  and  1.63,  respectively,  at  the  corresponding  sensitivities, 
the  FP  rates  were  reduced  by  40%,  44%,  and  42%  with  the  bilateral  symmetry  information.  The 
improvement  was  statistically  significance  (p  <  0.05)  as  estimated  by  JAFROC  analysis.  ©  2007 
American  Association  of  Physicists  in  Medicine.  [DOI:  10.1118/1.2756612] 

Key  words:  computer-aided  detection  (CAD),  bilateral  analysis,  mass  detection,  false  positive 
reduction 


I.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  death  among 
American  women  between  40  to  55  years  of  age.1  It  has  been 
reported  that  early  diagnosis  and  treatment  can  significantly 
improve  the  chance  of  survival  for  patients  with  breast 
cancer."-4  Although  mammography  is  a  powerful  screening 
tool  for  detecting  breast  cancer,5’6  studies  indicate  that  a  sub¬ 
stantial  fraction  of  breast  cancers  that  are  visible  upon  retro- 
spective  analyses  of  the  images  are  not  detected  initially.  It 
has  been  shown  that  computer-aided  detection  (CAD)  can 
increase  the  cancer  detection  rate  by  radiologists  both  in  the 
laboratory  and  in  clinical  practice.10-15 

In  screening  mammography,  two  mammographic  views, 
cranio-caudal  (CC)  and  mediolateral  oblique  (MLO)  views 
are  generally  taken  of  each  breast.  During  mammographic 
interpretation,  the  radiologist  combines  complex  information 
including  morphology,  texture,  and  geometric  location  of  any 
suspicious  structures  of  the  imaged  breast  from  different 
views,  asymmetric  density  patterns  between  bilateral  mam¬ 
mograms  of  the  same  view,  and  changes  between  the  current 
and  the  prior  mammograms  if  available.  Radiologists  have 


found  that  these  techniques  are  effective  in  improving  the 
accuracy  of  detecting  subtle  lesions  and  reducing  false  posi¬ 
tives  (FPs). 

Investigators  have  attempted  to  implement  the  multiple 
image  techniques  in  CAD  systems  to  improve  the  detection 
accuracy  of  abnormalities  and  the  classification  accuracy  of 
differentiating  malignant  and  benign  lesions.  Hadjiiski  et 
al. 1 6  developed  an  interval  change  analysis  of  masses  on  cur¬ 
rent  and  prior  mammograms  and  found  that  the  classification 
accuracy  of  masses  can  be  improved  significantly  in  com- 
parison  to  single  image  classification.  Paquerault  et  al.  de¬ 
veloped  a  two-view  (CC  and  MLO  views)  fusion  technique 
to  reduce  FPs  in  mass  detection  and  obtained  significant  im¬ 
provement  by  comparing  to  their  one-view  detection  system, 
van  Engeland  et  al.  recently  presented  a  two-view  CAD 
system  by  using  the  features  including  the  difference  in  the 
radial  distance  from  the  candidate  regions  to  the  nipple,  the 
gray  scale  correlation  between  both  regions,  and  the  mass 
likelihood  of  the  regions  determined  by  the  single  view  CAD 
scheme.  Yin  et  al. 19  used  bilateral  subtraction  in  a  prescreen¬ 
ing  step  of  a  mass  detection  program  to  locate  mass  candi¬ 
dates,  but  the  subsequent  image  analysis  was  performed 
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Fig.  1.  The  characteristics  of  our  mass  data  set:  (a)  distribution  of  mass  sizes,  (b)  distribution  of  mass  shapes,  (c)  distribution  of  mass  margins,  C: 
circumscribed,  Ind:  indistinct,  M :  microlobulated,  Ob:  obscured,  Sp:  spiculated,  and  (d)  distribution  of  the  breast  density  in  terms  of  BI-RADS  category 
estimated  by  a  MQSA  radiologist. 


20 

based  only  on  a  single  view.  Mendez  et  al.  developed  a 
bilateral  CAD  system  based  on  a  bilateral  subtraction  ap¬ 
proach  and  used  size  and  eccentricity  tests  and  texture  fea¬ 
tures  to  eliminate  FPs.  Again,  the  bilateral  information  is 
only  used  to  find  the  suspicious  objects  and  the  subsequent 
analysis  is  based  on  a  single  view. 

The  detection  of  masses  on  mammograms  is  a  challenging 
task.  The  normal  fibroglandular  tissue  in  the  breast  causes 
FPs  by  mimicking  masses  and  causes  false  negatives  (FNs) 
due  to  overlapping  with  lesions.  In  order  to  improve  the  per¬ 
formance  of  our  mass  detection  system,  we  are  investigating 
computer-vision  methods  by  incorporating  information  from 
two-view  mammograms17  and  bilateral  mammograms,”1 
emulating  radiologists’  mammographic  interpretation  tech¬ 
niques.  In  this  study,  we  will  discuss  our  approach  to  FP 
reduction  by  analyzing  the  symmetry  or  asymmetry  of  den¬ 
sity  patterns  between  bilateral  mammograms. 

II.  MATERIALS  AND  METHODS 
A.  Data  sets 

A  database  of  mammograms  was  collected  from  patient 
files  at  the  Department  of  Radiology  with  Institutional  Re¬ 
view  Board  approval.  The  mammograms  were  digitized  by  a 
Lumiscan  laser  scanner  with  a  pixel  size  of  50  /im 
X  50  /rm  and  12  bits  per  pixel.  The  pixel  size  was  increased 
to  100  /rmXlOO  fim  by  averaging  every  2X2  adjacent 
pixels  before  being  input  to  the  CAD  system.  In  this  study, 
two  data  sets  are  used:  a  mass  data  set  containing  bilateral 
digitized  mammograms  with  malignant  or  benign  masses  and 
a  no-mass  data  set  containing  bilateral  digitized  mammo¬ 
grams  without  masses,  verified  by  an  experienced  radiolo¬ 
gist.  All  cases  had  four  mammographic  views,  the  CC  view 


and  the  MLO  view  mammogram  for  both  breasts.  The  mass 
set  and  the  no-mass  data  set  contained  276  cases  (552  bilat¬ 
eral  pairs)  and  65  cases  (130  bilateral  pairs),  respectively, 
yielding  a  total  of  1364  mammograms.  The  mass  data  set 
was  used  to  estimate  the  detection  sensitivity  and  the  no¬ 
mass  data  set  was  used  for  estimating  the  FP  rate  (number  of 
FPs  per  image).  In  the  mass  data  set,  each  patient  had  a 
biopsy-proven  mass  in  one  of  the  breasts,  resulting  in  a  total 
of  276  masses,  166  of  which  were  benign  and  110  malignant. 
A  Mammography  Quality  Standard  Act  (MQSA)  radiologist 
identified  the  location  of  the  masses  based  on  all  available 
diagnostic  and  clinical  information  of  the  case,  measured  the 
mass  sizes  as  the  longest  dimension  seen  on  the  two-view 
mammograms,  provided  descriptors  of  the  mass  shapes  and 
mass  margins,  and  also  provided  an  estimate  of  the  breast 
density  in  term  of  Breast  Imaging  Reporting  and  Database 
System  (BI-RADS)  category.  Figure  1  shows  the  information 
of  our  data  set  which  includes  the  distributions  of  mass  sizes, 
mass  shapes,  mass  margins,  and  breast  density. 

For  training  and  evaluation  of  the  performances  of  the 
CAD  systems,  the  cases  in  our  mass  data  set  were  divided 
into  two  independent  data  subsets  containing  136  and  140 
cases,  respectively,  for  twofold  cross-validation  training  and 
testing.  Of  the  136  cases  in  subset  1,  52  were  malignant  and 
84  were  benign.  Of  the  140  cases  in  subset  2,  58  were  ma¬ 
lignant  and  82  were  benign.  The  no-mass  data  set  was  not 
used  during  training.  All  260  mammograms  were  kept  as 
independent  test  samples  to  be  used  with  both  test  subsets. 

B.  Methods 

Our  bilateral  CAD  system  combines  unilateral  features 
with  bilateral  features  to  reduce  FPs.  Similar  structures  that 
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Fig.  2.  Block  diagram  of  the  bilateral  CAD  system  for  mass  detection  on 
mammograms. 


appear  in  both  right  and  left  mammograms  at  corresponding 
locations  are  more  likely  to  be  normal  tissue  than  masses, 
whereas  asymmetric  density  may  indicate  a  developing  le¬ 
sion.  The  key  of  this  system  is  therefore  the  design  of  a 
classifier  that  can  differentiate  symmetry  and  asymmetry  of 
paired  regions  of  interest  (ROIs)  in  corresponding  regions  on 
bilateral  mammograms  of  the  same  view.  The  system  con¬ 
sists  of  four  steps:  (1)  mass  candidate  (MC)  localization,  (2) 
corresponding  ROIs  (CR)  registration,  (3)  feature  extraction 
and  analysis,  and  (4)  bilateral  information  fusion.  Figure  2 
shows  the  block  diagram  for  our  bilateral  CAD  system.  The 
detailed  description  for  each  step  is  presented  below. 


1.  Mass  candidate  localization 

Identification  of  mass  candidates  is  performed  by  the  fol¬ 
lowing  two  steps:  breast  segmentation  and  mass  candidate 
detection.  The  breast  image  is  first  segmented  from  the  sur¬ 
rounding  image  background  by  boundary  detection. 

The  algorithm  developed  by  Zhou  et  al.  ~  in  our  labora¬ 
tory  is  used  to  track  the  breast  boundary  and  segment  the 
breast  from  the  background.  Mass  detection  is  performed 
only  in  the  breast  region.  We  have  previously  developed  a 
mass  detection  system  for  unilateral  mammograms.-  “  The 
system  is  used  for  mass  candidate  detection  in  the  current 
study.  The  system  performs  mass  detection  in  two  steps.  In 
the  first  step,  a  gradient  field  analysis  method  is  used  to 
determine  the  seeds  of  mass  candidates  followed  by  a  region 
growing-4  method  to  segment  the  mass  candidates  starting 
from  those  seeds.  In  the  second  step,  the  gradient  conver¬ 
gence  is  calculated  using  the  gray  levels  and  the  shape  of  the 
segmented  mass  region  as  a  priori  information.  The  mass 
candidates  that  pass  the  gradient  convergence  criterion  are 
retained  for  further  analysis  in  the  bilateral  system.  Figure  3 
shows  an  example  of  mass  candidates  detected  on  a 
mammogram.  Figures  3(a)-3(c)  show  the  original  image, 
detected  breast  boundary,  and  the  detected  mass  candidates, 
respectively. 


(a)  (b)  (c) 


Fig.  3.  An  example  of  performing  the  mass  candidate  identification:  (a)  an 
original  mammogram,  (b)  the  detected  breast  boundary  of  (a),  a  mass  is 
marked  by  the  arrow,  and  (c)  the  detected  mass  candidates  of  (a). 


2.  Corresponding  ROI  registration 

For  each  mass  candidate,  its  corresponding  ROI  on  the 
contralateral  mammogram  is  identified  by  the  regional  regis¬ 
tration  technique  developed  previously  in  our  laboratory16 
with  a  modification  to  handle  the  special  case  when  the  dis¬ 
tance  between  the  nipple  location  and  the  center  of  a  ROI  is 
too  small  to  obtain  the  intersection  points  on  the  breast 
boundary.  The  nipple  location  on  each  image  was  manually 
identified  so  that  the  effectiveness  of  the  bilateral  analysis 
method  could  be  evaluated  independent  of  nipple  detection 
errors. 

The  original  region  registration  technique  included  the 
following  steps.  The  registration  is  performed  in  a  polar  co¬ 
ordinate  system  where  the  origin  is  located  at  the  nipple  lo¬ 
cation  of  a  breast  image.  Figure  4  shows  an  example  of  lo¬ 
cating  the  corresponding  ROI  of  a  mass  candidate  on  the 
contralateral  mammogram.  Using  the  distance  r  from  the 
nipple  o  to  the  center  of  the  mass  as  the  radius,  an  arc  cen¬ 
tered  at  the  origin  (nipple)  is  drawn.  The  arc  will  intersect  the 
mass  candidate  and  the  breast  boundary  at  two  points,  p  and 
q.  The  angle  between  om  and  op  is  defined  as  6,  the  angle 
between  op  and  oq  is  defined  as  a.  On  the  contralateral 
mammogram,  the  corresponding  ROI  in'  is  localized  with  a 
similar  procedure.  An  arc  of  radius  r  centered  at  the  nipple  o' 
of  the  contralateral  mammogram  is  drawn.  The  intersections 
of  the  arc  with  the  breast  boundary  are  p'  and  q' .  The  angle 


Fig.  4.  An  example  of  obtaining  the  corresponding  ROI  of  a  mass  candidate 
on  the  contralateral  mammogram:  (a)  mass  candidate  on  the  left  MLO  view 
at  m  and  (b)  corresponding  ROI  on  the  right  MLO  view  at  m' . 
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merged  by  a  third  LDA.  The  weights  of  this  LDA  classifier 
were  also  trained  with  the  training  subset.  The  output  score 
from  the  third  LDA  is  used  to  differentiate  true  positives 
(TPs)  from  FPs  in  the  bilateral  CAD  system. 


5.  Evaluation  methods 

The  detected  individual  objects  were  compared  with  the 
true  mass  location  marked  by  an  experienced  radiologist.  An 
object  was  considered  to  be  a  TP  if  the  overlap  between  the 
detected  object  and  the  true  mass  was  greater  than  25%.  The 
25%  threshold  was  selected  as  described  in  our  previous 
study.’0 

To  evaluate  the  performance  of  our  bilateral  LDA  classi¬ 
fier,  the  test  discriminant  scores  were  analyzed  using  receiver 
operating  characteristic  (ROC)  methodology.31  The  accuracy 
for  classification  of  mass  and  normal  tissue  was  evaluated  as 
the  area  under  the  ROC  curve,  Az. 

The  detection  performance  of  the  bilateral  CAD  system 
was  assessed  by  free  response  ROC  (FROC)  analysis.  A 
FROC  curve  shows  the  relationship  between  the  detection 
sensitivity  and  the  FP  rate  as  the  decision  threshold  varies. 
FROC  curves  were  presented  on  a  per-image  and  a  per-case 
basis.  For  image-based  FROC  analysis,  the  mass  on  each 
mammogram  was  considered  an  independent  true  object.  For 
case-based  FROC  analysis,  the  same  mass  imaged  on  the 
two-view  mammograms  was  considered  to  be  one  true  object 
and  detection  of  the  masses  on  either  view  or  on  both  views 
was  considered  to  be  a  TP  detection. 

Two  sets  of  trained  parameters  were  acquired  as  a  result 
of  the  twofold  cross-validation  training.  To  estimate  the  FP 
rate  on  normal  mammograms  when  the  trained  CAD  system 
is  used  in  a  screening  setting,  we  applied  the  trained  unilat¬ 
eral  and  bilateral  systems  to  the  260  no-mass  mammograms 
for  independent  testing.  The  number  of  FP  marks  produced 
by  the  algorithm  was  estimated  by  counting  the  detected  ob¬ 
jects  on  these  normal  cases  only.  The  mass  sensitivity  was 
determined  by  counting  only  the  masses  on  the  correspond¬ 
ing  test  mass  subset.  The  combination  of  the  sensitivity  from 
the  test  mass  subset  and  the  FP  rate  from  the  normal  data  set 
at  the  corresponding  detection  thresholds  resulted  in  a  test 
FROC  curve.  The  training  and  testing  procedure  were  per¬ 
formed  for  each  cycle  of  the  twofold  cross-validation  pro¬ 
cess,  thereby  generating  two  test  FROC  curves.  To  estimate 
the  overall  performance  of  the  CAD  system,  an  average  test 
FROC  curve  is  obtained  by  averaging  the  FP  rates  from  the 
FROC  curves  of  the  two  mass  subsets  at  the  corresponding 
sensitivities. 

Chakraborty  et  al.  ~  proposed  a  JAFROC  method  and  pro¬ 
vided  software  to  estimate  the  statistical  significance  of  the 
difference  between  two  FROC  curves.  We  employed  the  JA¬ 
FROC  analysis  to  evaluate  the  difference  in  the  FROC 
curves  obtained  from  the  unilateral  CAD  system  and  the  bi¬ 
lateral  CAD  system. 


III.  RESULTS 

A.  Bilateral  feature  analysis 

Figures  6  and  7  show  examples  of  detection  results  ob¬ 
tained  from  the  unilateral  system  and  the  bilateral  system. 
Figure  6  shows  a  mass  that  was  initially  detected  as  a  mass 
candidate  but  was  excluded  in  the  false  positive  reduction 
steps  and  was  therefore  a  FN  of  the  unilateral  CAD  system. 
The  bilateral  analysis  increased  the  likelihood  score  of  this 
mass.  It  was  therefore  not  excluded  in  the  false  positive  re¬ 
duction  steps  and  became  a  TP  in  the  bilateral  CAD  system. 

Figure  7  shows  an  example  of  an  FP  detected  by  the  uni¬ 
lateral  CAD  system.  The  FP  was  excluded  in  the  bilateral 
system  because  it  was  found  to  have  high  symmetry  with  the 
tissue  in  the  contralateral  breast,  as  shown  in  the  ROI  in  Fig. 
7(d),  by  the  bilateral  analysis. 

B.  Performance  evaluation 

In  the  prescreening  process,  we  obtained  a  large  number 
of  mass  candidates  on  each  mammogram.  Each  mass  candi¬ 
date  was  paired  with  a  corresponding  ROI  in  the  contralat¬ 
eral  breast.  A  total  of  3127  and  3402  mass  candidates  were 
extracted  for  training  subsets  1  and  2,  respectively,  which 
included  98.5%  (134/136)  and  99.3%  (139/140)  of  the 


(c)  (d) 


Fig.  6.  (a)  Mammogram  containing  a  mass  marked  by  the  rectangular  box. 
(b)  A  contralateral  mammogram  of  (a)  and  the  rectangular  box  is  the  corre¬ 
sponding  ROI  of  the  mass  in  (a)  estimated  by  the  automated  regional  reg¬ 
istration  technique,  (c)  ROI  extracted  from  (a)  containing  a  mass  detected  at 
the  prescreening  stage  but  excluded  at  the  final  stage  of  the  unilateral  CAD 
system,  (d)  The  corresponding  ROI  in  the  contralateral  breast.  Bilateral 
analysis  of  this  ROI  pair  increased  the  likelihood  score  of  the  mass  which 
was  then  detected  as  a  TP  in  the  bilateral  CAD  system. 
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Fig.  7.  (a)  Mammogram  and  the  rectangular  ROI  containing  a  mass  candi¬ 
date.  (b)  The  contralateral  mammogram  of  (a)  and  the  rectangular  box  is  the 
corresponding  ROI  of  the  mass  candidate  in  (a),  (c)  ROI  extracted  from  (a) 
containing  normal  tissue  detected  at  the  prescreening  stage  and  included  as 
a  FP  at  the  final  stage  of  the  unilateral  CAD  system,  (d)  The  corresponding 
ROI  in  the  contralateral  breast.  Bilateral  analysis  of  this  ROI  pair  reduced 
the  likelihood  score  of  the  normal  tissue  which  then  became  a  TN  in  the 
bilateral  CAD  system. 


masses  in  the  two  subsets.  The  mass  candidates  in  the  uni¬ 
lateral  mammograms  and  the  ROI  pairs  from  bilateral  mam¬ 
mograms  in  the  training  subset  were  used  to  design  the  uni¬ 
lateral  and  bilateral  classifiers  in  each  of  the  twofold  cross- 
validation  cycles.  The  most  effective  subset  of  features  from 
the  available  feature  pool  was  selected  for  each  of  the  train¬ 
ing  subsets  during  the  training  procedure.  For  the  unilateral 
LDA  classifier,  20  (11  global  and  9  local)  and  19  (12  global 
and  7  local)  texture  features  were  selected  from  the  two  in¬ 


(a) 


dependent  training  subsets,  respectively.  For  the  bilateral 
LDA  classifier,  24  (11  global  texture,  9  local  texture,  and  4 
morphological)  and  23  (12  global,  8  local,  and  3  morphologi¬ 
cal)  features  were  selected  from  the  two  independent  training 
subsets,  respectively.  The  validation  Az  values  of  the  LDA 
classifier  during  the  leave-one-case-out  training  were 
0.846+0.011  and  0.832+0.009,  respectively,  for  the  two 
training  subsets  using  the  unilateral  LDA  classifier,  and  were 
0.862+0.015  and  0.859  +  0.012,  respectively,  using  the  bilat¬ 
eral  LDA  classifier.  The  classifiers  achieved  Az  values  of 
0.833  +  0.015  and  0.831+0.011,  respectively,  for  the  two  test 
subsets  using  the  unilateral  LDA  classifier,  and  0.853  +  0.013 
and  0.849  +  0.011,  respectively,  using  the  bilateral  LDA  clas¬ 
sifier. 

Figure  8  shows  the  average  test  FROC  curves  for  the 
unilateral  and  bilateral  CAD  systems  after  FP  reduction  with 
the  corresponding  trained  LDA  classifiers  when  the  FP  rates 
were  estimated  from  the  test  subsets  with  masses.  Figure  9 
shows  the  corresponding  results  when  the  FP  rates  were  es¬ 
timated  on  the  set  of  no-mass  mammograms.  Table  I  sum¬ 
marizes  the  average  FP  rates  estimated  with  both  the  mass 
and  no-mass  data  sets  at  several  case-based  sensitivities. 

Because  the  detection  performance  of  CAD  systems  on 
cancer  cases  is  of  prime  importance,  we  analyzed  the  perfor¬ 
mance  of  our  CAD  systems  for  the  subset  of  cases  containing 
malignant  masses.  Figure  10  compares  the  average  test 
FROC  curves  for  the  unilateral  and  bilateral  CAD  systems 
on  malignant  cases  only.  Figure  11  shows  the  average  test 
FROC  curves  for  the  unilateral  and  bilateral  CAD  systems 
with  the  sensitivities  estimated  on  malignant  cases  only  and 
the  FP  rates  estimated  on  the  set  of  no-mass  mammograms. 
The  bilateral  CAD  system  achieved  a  case-based  sensitivity 
of  70%,  80%,  and  85%  at  average  FP  rates  of  0.35,  0.75,  and 
0.95  FPs/image,  respectively,  on  the  test  subset  of  malignant 
masses.  In  comparison  to  the  average  FP  rates  for  the  unilat¬ 
eral  CAD  system  of  0.58,  1.33,  and  1.63  FPs/image,  respec¬ 
tively,  at  the  corresponding  sensitivities,  the  FP  rates  were 
reduced  by  40%,  44%,  and  42%  with  the  bilateral  symmetry 
information.  Table  II  summarizes  the  average  FP  rates  esti- 


(b) 


Fig.  8.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from 
detection  on  mammograms  in  the  test  subsets  with  masses. 
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(a) 


(b) 


Fig.  9.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  the  bilateral  CAD  systems.  The  FP  rates  were  estimated  from 
detection  on  mammograms  in  the  no-mass  data  set. 


Table  I.  The  average  FP  reduction  rates  at  case-based  sensitivities  of  70%,  80%,  and  85%  for  the  test  subsets 
when  the  FP  rates  were  estimated  from  the  mass  and  no-mass  data  sets. 


FP  rate  estimated  from  mass  data  set  FP  rate  estimated  from  no-mass  data  set 


Unilateral  CAD 

Bilateral  CAD 

FP  Reduction 

Unilateral  CAD 

Bilateral  CAD 

FP  Reduction 

70% 

0.70 

0.53 

24% 

0.86 

0.53 

38% 

80% 

1.10 

0.87 

21% 

1.32 

1.04 

21% 

85% 

1.46 

1.15 

21% 

1.72 

1.32 

23% 

(a)  (b) 


Fig.  10.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  bilateral  CAD  systems  for  detection  on  cases  with  malignant 
masses  only.  The  FP  rates  were  estimated  from  in  the  same  data  set. 
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(a)  (b) 


Fig.  11.  (a)  Image-based  and  (b)  case-based  average  test  FROC  curves  from  the  unilateral  and  bilateral  CAD  systems  for  detection  on  cases  with  malignant 
masses  only.  The  FP  rates  were  estimated  from  the  no-mass  data  set. 


Table  II.  The  average  FP  reduction  rates  for  cases  with  malignant  masses  at  case-based  sensitivities  of  70%, 
80%,  and  85%  for  the  test  subsets  when  the  FP  rates  were  estimated  from  the  mass  and  no-mass  data  sets. 


FP  rate  estimated  from  mass  data  set  FP  rate  estimated  from  no-mass  data  set 


Unilateral  CAD 

Bilateral  CAD 

FP  Reduction 

Unilateral  CAD 

Bilateral  CAD 

FP  Reduction 

70% 

0.43 

0.33 

23% 

0.58 

0.35 

40% 

80% 

0.78 

0.62 

21% 

1.33 

0.75 

44% 

85% 

0.94 

0.78 

17% 

1.63 

0.95 

42% 

Table  III.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performance  of  the 
unilateral  and  bilateral  CAD  systems  on  test  subsets  1  and  2.  The  FP  rates  of  the  FROC  curves  were  estimated 
from  the  no-mass  data  set:  (a)  all  cases  and  (b)  malignant  cases. 


(a) 

FOM  (JAFROC) 

Test  subset  1 

Test  subset  2 

Unilateral  CAD 

0.52 

0.48 

Bilateral  CAD 

0.58 

0.51 

p  value 

<0.001 

0.008 

(b) 

FOM  (JAFROC) 

Test  subset  1  (malignant  only) 

Test  subset  2  (malignant  only) 

Unilateral  CAD 

0.56 

0.53 

Bilateral  CAD 

0.61 

0.56 

p  value 

0.009 

0.003 

mated  with  both  the  mass  and  no-mass  data  sets  for  cases 
with  malignant  masses  only  at  several  case-based  sensitivi¬ 
ties. 

The  figure-of-merit  (FOM)  from  the  output  of  the  JA- 
FROC  software  is  summarized  in  Table  111(a)  for  all  cases 
and  in  Table  111(b)  for  malignant  cases  only.  The  difference 
between  the  FOMs  for  the  unilateral  and  the  bilateral  CAD 
systems  was  statistically  significant  (p  <  0.05)  for  all  com¬ 
parisons. 

IV.  DISCUSSION 

Symmetry  between  breast  structures  in  bilateral  pairs  of 
mammograms  is  an  important  feature  used  by  radiologists 


for  mass  detection  or  FP  reduction.  Similar  structures  that 
appear  in  both  right  and  left  mammograms  are  more  likely  to 
be  normal  tissue  than  abnormal  lesions.  Our  bilateral  analy¬ 
sis  translates  this  radiologists’  knowledge  to  computer  vision 
techniques  so  that  the  CAD  system  can  utilize  the  symmetry 
of  breast  tissue  on  bilateral  mammograms  to  improve  detec¬ 
tion  accuracy.  The  results  of  our  study  show  that  the  bilateral 
information  is  an  effective  technique  for  reducing  FPs. 

The  bilateral  features  are  important  factors  affecting  the 
performance  of  the  bilateral  LDA  classifier.  In  this  study,  the 
bilateral  features  were  derived  from  features  extracted  from 
each  pair  of  ROIs,  i.e.,  the  mass  candidate  and  its  corre¬ 
sponding  ROI,  using  the  maximum-to-minimum  ratio  strat- 
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egy  as  shown  in  Eq.  (1).  We  also  investigated  if  other  strat¬ 
egies,  including  BF[i,j]  =  MC[(,j]/CR[i,j],  BF[/,/] 

=  (MC[t  ,7]  -  CR[t  ,/])/MC[i  ,y  ],  and  BF[i  ,j]=(MC[i  J] 
-CR[/,y])/[(MC[/  ,y']  +  CR[/  ,y"])/2],  could  improve  the  per¬ 
formance  of  the  bilateral  CAD  system.  It  was  found  that 
these  strategies  are  not  as  effective  as  the  maximum-to- 
minimum  ratio.  Specifically,  among  the  Az  values  of  all  bi¬ 
lateral  features,  72%  of  those  from  the  latter  strategies  are 
lower  than  those  of  their  corresponding  features  obtained  by 
Eq.  (1).  The  advantage  of  using  bilateral  symmetry  measures 
defined  by  the  maximum-to-minimum  ratio  can  be  seen  by 
considering  the  following  example:  assuming  two  ROl  pairs 
that  are  highly  asymmetric,  (MCh  CRj)  and  (MC2,  CR2),  in 
which  MC^CRj  and  MC2<CR2,  their  bilateral  features 
derived  as  the  maximum-to-minimum  ratio  will  both  be 
greater  than  1.  However,  the  bilateral  features  obtained  from 
BF[;,j]=MC[;,y]/CR[;  ,j]  will  be  greater  than  1  for  (MCi, 
CR  | )  but  smaller  than  1  for  (MC2,  CR2).  The  bilateral  mea¬ 
sures  obtained  from  BF[i,  j]  =  (MC[i, j]-CR[i,j])/ 
MC[fJ]  or  BF[i  ,7]  =  (MC[f  ,7]  -  CR[i  ,7])  /  [(MC[i  ,7] 

+  CR[/,7'])/2]  will  be  positive  for  (MC, ,  CR,)  but  negative 
for  (MC2,  CR2).  The  bilateral  feature  defined  in  Eq.  (1) 
therefore  describes  the  asymmetry  between  the  ROI  pairs, 
regardless  which  ROI  has  a  larger  feature  value,  whereas  the 
other  three  bilateral  features  do  not  consistently  provide  fea¬ 
ture  values  in  the  same  direction.  The  maximum-to- 
minimum  ratio  approach  can  thus  achieve  better  performance 
than  the  other  three  strategies. 

The  corresponding  ROI  registration  is  an  important  pro¬ 
cedure  in  the  bilateral  analysis.  The  two  breasts  of  a  given 
patient  are  not  perfectly  symmetrical  and  other  factors  such 
as  positioning  and  compression  further  introduce  variability 
in  the  symmetry.  We  investigated  the  effect  of  variability  in 
the  registered  ROI  locations  on  bilateral  analysis.  For  this 
purpose,  the  prescreening  step  of  our  unilateral  CAD  system 
was  first  applied  to  the  contralateral  mammogram  to  locate 
the  mass  candidates.  For  a  given  ROI  predicted  by  the  reg¬ 
istration  method  on  the  contralateral  mammogram,  its  loca¬ 
tion  was  compared  to  the  ROI  locations  of  these  mass  can¬ 
didates  by  evaluating  an  overlap  ratio,  defined  as  the 
intersection  between  the  predicted  ROI  and  a  mass  candidate 
ROI  relative  to  the  area  of  the  smaller  ROIs.  If  the  overlap 
ratio  of  the  predicted  ROI  with  a  mass  candidate  ROI  was 
greater  than  a  chosen  threshold,  the  location  of  the  predicted 
ROI  would  be  changed  to  the  location  of  the  mass  candidate 
ROI.  If  the  predicted  ROI  overlapped  with  more  than  one 
mass  candidate  ROIs,  the  mass  candidate  ROI  having  the 
largest  overlap  ratio  that  exceeded  the  threshold  would  be 
used.  We  evaluated  the  effects  of  this  ROI  location  adjust¬ 
ment  for  a  range  of  thresholds.  It  was  found  that  when  the 
overlap  ratio  threshold  was  chosen  to  be  about  0. 7-0.9,  the 
performance  of  the  bilateral  CAD  system  would  have  a  small 
but  insignificant  improvement  compared  to  the  bilateral 
CAD  system  without  the  ROI  adjustment  process.  When  the 
overlap  ratio  threshold  was  smaller  than  0.5,  the  performance 
of  the  bilateral  CAD  system  was  degraded.  This  study  indi¬ 


cated  that  small  variability  of  the  predicted  ROI  location  on 
the  contralateral  mammogram  does  not  have  a  strong  effect 
on  the  performance  of  the  bilateral  analysis. 

Various  registration  methods  have  been  attempted  for  reg¬ 
istration  of  mammograms  of  the  same  breast.  For  example, 
the  warping  approach  proposed  by  Sallam  et  al.  and  the 
multiple-control-point  approach  proposed  by  Vujovic  et  al:'1 
Those  approaches  depended  on  the  identification  of  corre¬ 
sponding  control  points.  However,  there  are  few,  if  any,  in¬ 
variant  landmarks  on  mammograms  that  can  be  identified 
automatically  because  the  breast  is  composed  of  soft  tissue. 
The  projected  image  of  the  breast  tissue  often  changes  even 
when  the  same  breast  is  compressed  two  different  times.  It  is 
even  more  variable  between  a  breast  and  its  contralateral 
breast.  Commonly  used  rigid  or  nonrigid  registration  meth¬ 
ods  will  not  be  appropriate  for  this  application.  We  therefore 
developed  the  regional  registration  method  for  correlation  of 
ROIs  on  mammograms.  Our  regional  registration  method 
uses  the  nipple  and  the  distance  between  the  nipple  and  the 
ROI  center  to  be  the  relatively  invariant  information.  The 
lesion  in  the  target  breast  is  estimated  to  be  located  within  a 
band  of  tissue  centered  along  the  arc  traced  using  the  nipple- 
to-lesion  distance  as  the  radius  and  with  the  origin  at  the 
nipple.  This  method  emulates  a  technique  used  by  many  ra¬ 
diologists  in  identifying  corresponding  lesions  in  two-view 
mammograms  or  current  and  prior  mammograms, 
van  Engeland  et  al.  compared  methods  for  mammogram 
registration  based  on  breast  alignment  and  linear  and  nonlin¬ 
ear  warping.  They  concluded  that  linear  warping  using  mu¬ 
tual  information  performed  better  than  the  other  methods.  We 
also  performed  a  study  comparing  our  regional  registration 
method  to  correlation  or  mutual  information  based  linear  and 
nonlinear  warping  methods  using  a  data  set  of  390  current 
and  prior  mammogram  pairs.36  Our  results  showed  that  the 
regional  registration  method  outperformed  the  warping  ap¬ 
proaches  in  identifying  corresponding  lesions  on  the  mam¬ 
mogram  pairs.  The  localization  of  symmetric  ROIs  on  the 
bilateral  breasts  is  similar  to  the  problem  of  registering  ROIs 
on  current  and  prior  mammograms.  We  therefore  adapted  the 
regional  registration  method  to  the  bilateral  analysis  in  this 
study. 

To  implement  the  bilateral  analysis  in  a  practical  CAD 
system,  the  nipple  locations  have  to  be  detected  automati¬ 
cally.  We  have  previously  developed  a  nipple  detection  algo¬ 
rithm  to  determine  the  nipple  location  on  a  mammogram. 
The  algorithm  could  detect  the  nipple  locations  within  1  cm 
of  the  manually  identified  locations  in  about  70%  of  the  im¬ 
ages  in  the  data  set  used  in  this  study.  A  large  deviation  of  the 
nipple  location  from  the  true  location  may  affect  the  regional 
registration  technique  in  locating  the  symmetric  ROI  on  the 
contralateral  mammogram,  which  in  turn  may  degrade  the 
performance  of  the  bilateral  analysis  of  tissue  symmetry.  We 
therefore  used  the  manually  identified  nipple  locations  in  this 
study  in  order  to  develop  the  bilateral  classifier  without  the 
influence  of  other  confounding  factors.  Further  work  is  un¬ 
derway  to  improve  the  nipple  detection  algorithm  and  to  in¬ 
vestigate  the  effect  of  nipple  detection  accuracy  on  the  per¬ 
formance  of  the  bilateral  system. 
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The  inward  nipple  projection  is  often  a  result  of  position¬ 
ing  and  compression  problems  so  that  the  nipple  is  not  pro¬ 
jected  in  profile.  Since  there  is  not  enough  information  from 
the  two-dimensional  projected  mammograms  to  correct  for 
the  deformation  of  the  breast,  we  designed  a  simple,  ad  hoc 
correction  method  to  allow  the  arc  drawn  using  the  nipple- 
to-mass  distance  as  the  radius  to  intersect  the  breast  bound¬ 
ary.  In  these  cases,  the  breast  image  on  the  bilateral  mammo¬ 
gram  often  does  not  have  a  similar  positioning  problem  and 
the  difference  in  the  compression  of  the  two  breasts  may 
cause  large  uncertainty  in  the  registration  regardless  of  the 
correction  method.  For  cases  in  which  both  breasts  actually 
have  inward  nipples  and  the  breast  images  are  similar,  our 
correction  method  will  not  cause  additional  errors  because 
similar  correction  will  be  applied  to  the  bilateral  mammo¬ 
grams  and  symmetric  ROIs  will  be  identified  on  the  mam¬ 
mograms. 

Our  motivation  of  this  study  is  to  reduce  the  FPs  of  a 
CAD  system  for  mass  detection.  A  CAD  detection  system  is 
generally  intended  for  use  in  screening  mammography.  At 
the  screening  stage,  all  lesions  of  concern  should  be  pointed 
out  to  radiologists  so  that  the  radiologists  can  judge  whether 
a  recall  is  warranted.  If  a  detection  system  is  trained  to  mark 
only  the  malignant  lesions,  it  may  be  attempting  to  play  the 
role  of  a  triage  system  (alerting  radiologists  to  work  up  only 
“malignant”  cases)  rather  than  that  of  a  second  reader.  Fur¬ 
thermore,  since  computerized  lesion  detection  or  character¬ 
ization  on  mammograms  is  not  100%  sensitive,  it  will  be 
confusing  to  the  radiologists  whether  an  unmarked  suspi¬ 
cious  lesion  is  missed  or  it  is  considered  benign  by  the  com¬ 
puter.  We  believe  that  computer-aided  diagnosis  (CADx) 
may  be  used  in  different  ways  in  conjunction  with  a  CAD 
detection  system.  For  example,  the  likelihood  of  malignancy 
may  be  estimated  by  the  CADx  system  and  displayed  for 
every  detected  lesion,  and/or  a  CADx  system  may  be  used 
during  diagnostic  workup.  Either  way  the  CAD  system  will 
first  alert  radiologists  to  all  masses,  leaving  the  assessment  of 
malignancy  or  benignity  to  a  second  stage.  We  therefore  in¬ 
cluded  both  malignant  and  benign  masses  in  the  training  sets 
to  train  the  system  to  detect  all  masses. 

V.  CONCLUSIONS 

We  developed  a  FP  reduction  method  to  improve  comput¬ 
erized  mass  detection  on  mammograms  based  on  analysis  of 
bilateral  information.  It  was  found  that  the  false  positives  can 
be  reduced  by  training  a  new  classifier  for  bilateral  features 
and  combining  its  output  score  with  the  unilateral  classifier 
score.  The  bilateral  CAD  system  achieved  a  case-based  sen¬ 
sitivity  of  70%,  80%,  and  85%  for  detection  of  malignant 
masses  at  average  FP  rates  of  0.35,  0.75,  and  0.95  FPs/ 
image,  respectively,  on  the  test  data  set.  In  comparison  to  the 
average  FP  rates  for  the  unilateral  CAD  system  of  0.58,  1.33, 
and  1.63  FPs/image,  respectively,  at  the  corresponding  sen¬ 
sitivities,  the  FP  rates  were  reduced  by  40%,  44%,  and  42% 
with  the  bilateral  symmetry  information.  The  improvement 
in  the  overall  detection  accuracy  is  statistically  significant 
(/?  <  0.05)  by  JAFROC  analysis.  Our  results  demonstrate  that 


the  bilateral  analysis  can  differentiate  the  similarity  and  dis¬ 
similarity  between  tissues  at  corresponding  locations  in  the 
bilateral  views  and  is  useful  for  improving  the  performance 
of  a  unilateral  CAD  system  by  further  reducing  the  FPs. 
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ABSTRACT 


40  Purpose:  To  develop  a  computer-aided  detection  (CAD)  system  that  combined  a  dual  system 
approach  with  a  two-view  fusion  method  to  improve  the  accuracy  of  mass  detection  on 
mammograms. 

Methods:  We  previously  developed  a  dual  CAD  system  that  merged  the  decision  from  two  mass 
detection  systems  in  parallel,  one  trained  with  average  masses  and  another  trained  with  subtle 
45  masses,  to  improve  sensitivity  without  excessively  increasing  false-positives  (FPs).  In  this  study, 

we  further  designed  a  two-view  fusion  method  to  combine  the  information  from  different 
mammographic  views.  Mass  candidates  detected  independently  by  the  dual  system  on  the  two- 
view  mammograms  were  first  identified  as  potential  pairs  based  on  a  regional  registration 
technique.  A  similarity  measure  was  designed  to  differentiate  TP-TP  pairs  from  other  pairs  (TP- 
50  FP  and  FP-FP  pairs)  using  paired  morphological  features,  Hessian  feature,  and  texture  features. 

A  two-view  fusion  score  for  each  object  was  generated  by  weighting  the  similarity  measure  with 
the  cross  correlation  measure  of  the  object  pair.  Finally,  a  linear  discriminant  analysis  (LDA) 
classifier  was  trained  to  combine  the  mass  likelihood  score  of  the  object  from  the  single-view 
dual  system  and  the  two-view  fusion  score  for  classification  of  masses  and  FPs.  A  total  of  2332 
55  mammograms  from  735  subjects  including  800  nonnal  mammograms  from  200  normal  subjects 

were  collected  with  Institutional  Review  Board  (IRB)  approval. 

Results:  When  the  single-view  CAD  system  that  was  trained  with  average  masses  only  were 
applied  to  the  test  sets,  the  average  case-based  sensitivities  were  50.6%  and  63.6%  for  average 
masses  on  current  mammograms  and  22.6%  and  36.2%  for  subtle  masses  on  prior  mammograms 
at  0.5  and  1  FPs/image,  respectively.  With  the  new  two-view  dual-system  approach,  the  average 
case-based  sensitivities  were  improved  to  67.4%  and  83.7%  for  average  masses  and  44.8%  and 


60 


57.0%  for  subtle  masses  at  the  same  FP  rates. 


Conclusions:  The  improvement  with  the  proposed  method  was  found  to  be  statistically 
significant  (/?<0.0001)  by  JAFROC  analysis. 

Key  words:  computer-aided  detection,  breast  mass,  false  positive  reduction 
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I.  INTRODUCTION 


In  screening  mammography,  two  mammographic  views,  cranio-caudal  (CC)  and 
mediolateral  oblique  (MLO)  views,  are  routinely  performed  for  each  breast.  During 
70  mammographic  interpretation,  the  radiologist  combines  the  information  from  the  two  views  and 
evaluates  the  changes  from  available  prior  examinations  to  confirm  true  positives  (TPs)  and  to 
reduce  false  positives  (FPs).  It  has  been  reported  that  screening  mammography  using  two  views 
per  breast  rather  than  one  view  can  increase  cancer  detection  sensitivity  while  decreasing  the 
recall  rate  ia.  Two-view  screening  mammography  has  become  the  most  common  and  standard 
75  method  for  breast  cancer  screening  in  developed  countries. 

Investigators  have  attempted  to  implement  multiple  image  techniques  in  CAD  systems  to 
improve  the  accuracy  of  lesion  analysis  on  mammograms.  Kita  et  al  developed  a  method  to 
find  correspondences  between  CC  and  MLO  views  of  the  same  breast.  Their  method  was  based 
on  modeling  the  deformation  of  the  breast  caused  by  compression  in  different  views.  For  a  data 
80  set  of  37  lesions,  their  method  could  predict  the  location  in  the  second  view  with  an  average 

minimum  distance  of  6.78±5.85  mm  between  the  correct  position  and  an  epipolar  line  . 
Paguerault  et  al.4  investigated  a  two-view  fusion  scheme  to  improve  the  performance  of  a  CAD 
system  for  mass  detection.  In  their  preliminary  study,  the  computer-detected  object  pairs  in  two 
views  were  first  identified  by  using  the  distance  between  the  nipple  and  the  detected  objects.5  A 
85  trained  correspondence  classifier  was  then  used  to  differentiate  the  TP-TP  pairs  from  other  pairs 

using  extracted  image  features.  Finally,  a  fusion  scheme  that  combined  ranking  and  averaging  of 
the  prescreening  and  correspondence  scores  was  used  to  estimate  a  final  mass  score  for  each 
prescreened  object.  Using  169  pairs  of  mammograms,  they  found  that  the  two-view  fusion 
system  achieved  a  significant  improvement  compared  to  their  single-view  CAD  system. 
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In  a  recent  study,  Engeland  et  al.  6  investigated  a  method  in  which  a  two-view  classifier 
was  trained  with  both  single-view  and  two-view  features  to  classify  the  TP  from  normal 
structures  instead  of  training  a  classifier  to  differentiate  the  object  pairs.  They  evaluated  the 
method  using  948  cases  and  found  that  the  method  mainly  improved  the  image-based  FROC 
curve  in  the  high  specificity  range.  However,  no  improvement  was  found  in  the  case-based 
95  FROC  curve  and  they  also  pointed  out  that  their  method  may  be  less  relevant  when  a  CAD 

system  is  merely  used  to  prompt  regions  at  a  high  false  positive  rate.  Sahiner  et  alJ  investigated 
the  use  of  joint  two-view  information  to  improve  computerized  microcalcification  detection. 
The  two-view  fusion  method  was  trained  and  tested  on  a  total  of  486  paired  mammograms.  The 
improvement  in  detection  with  their  method  was  found  to  be  statistically  significant  for  both 
100  malignant  and  benign  clusters.  Zheng  et  al  proposed  a  two-view  CAD  system  for  masses 
which  aimed  to  reduce  the  FP  rate  on  a  given  sensitivity  level.  It  was  found  that  at  a  74.4%  case- 
based  sensitivity,  their  two-view  approach  reduced  the  FP  rate  by  23.7%.  Qian  et  al 9  designed  a 
method  for  fusing  detection  results  and  image  features  from  two  views.  On  a  data  set  of  200 
nonnal  mammograms  and  200  mammograms  containing  small  (<10  mm)  masses,  they  obtained 
105  a  significantly  improved  detection  performance  when  they  used  their  two-view  mammogram 

analysis  method.  Recently,  Velikova  et  al.10  proposed  a  Bayesian  network  framework  that  used 
the  dependences  between  MFO  and  CC  views  to  obtain  a  single  measure  for  estimating  whether 
the  mammographic  view,  the  breast,  and  the  case  contains  a  cancerous  lesion.  With  the  use  of 
the  Bayesian  network,  they  obtained  a  statistically  significant  improvement  compared  to  single- 
110  view  analysis  for  estimating  whether  the  view  contains  a  malignant  mass.  Furthermore,  when 
the  view-based  results  were  combined  using  logistic  regression  to  estimate  whether  the  breast  or 
the  case  contains  a  malignant  mass,  the  improvement  was  again  statistically  significant. 
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The  detection  of  masses  on  mammograms  is  a  challenging  task  because  the  overlapping 
fibroglandular  tissue  may  mimick  a  mass  or  obscures  the  lesion.  Although  researchers  have 
115  devoted  extensive  efforts  to  the  development  of  CAD  systems  for  mass  detection,  the 

performances  of  current  CAD  systems  are  far  from  ideal.  We  have  been  developing  various  new 
techniques  to  improve  the  accuracy  of  mass  detection1112.  In  our  previous  study,  we  proposed  a 
dual  CAD  system  approach  that  combined  two  mass  detection  systems  in  parallel,  one  was 
trained  with  masses  of  average  subtlety  and  the  other  with  subtle  masses.  The  dual  system 
120  approach  achieved  significant  improvement  in  the  detection  of  both  average  and  subtle  masses 

compared  to  the  conventional  single  system  approach13.  We  have  also  demonstrated  the 
feasibility  of  a  new  two-view  analysis  method  for  fusion  of  information  from  different 
mammographic  views14.  In  this  study,  our  purpose  is  to  further  improve  the  two-view  fusion 
method  and  to  develop  a  CAD  system  which  combines  the  dual  system  approach  with  the  two- 
125  view  approach.  The  effectiveness  of  the  new  two-view  dual  CAD  system  is  evaluated  with  a 

relatively  large  data  set. 

II.  MATERIALS  AND  METHODS 

2.1  Image  Data  Sets 

130  All  mammograms  in  this  study  were  collected  retrospectively  from  patient  files  of  the 

Department  of  Radiology  at  the  University  of  Michigan  with  Institutional  Review  Board  (IRB) 
approval.  The  mammograms  were  digitized  with  a  LUMISYS  85  laser  film  scanner  with  a  pixel 
size  of  50  pm  x  50  pm  and  4096  gray  levels.  The  full  resolution  mammograms  were  first 
smoothed  with  2x2  box  filter  and  subsampled  by  a  factor  of  2,  resulting  in  1 00pm  x  100pm 
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135  images.  The  images  at  a  pixel  size  of  100|im  x  100pm  were  used  as  the  input  to  the  CAD 

system. 

Two  independent  data  sets  of  mammograms  were  collected  for  this  study:  a  mass  set  with 
biopsy-proven  malignant  or  benign  masses  and  a  nonnal  set  containing  bilateral  mammograms. 
The  mass  set  contained  535  cases  with  535  biopsy  proven  masses  in  which  345  cases  included 
140  only  current  mammograms  and  190  cases  included  both  the  current  and  the  prior  mammograms. 

233  of  the  masses  are  biopsy  proven  to  be  malignant  and  302  to  be  benign.  Each  case  contained 
two  mammographic  views  (CC  view  and  MLO  view  or  the  lateral  view).  The  total  number  of 
mammograms  in  the  mass  set  is  1532  including  1070  current  mammograms  and  462  prior 
mammograms  in  which  35  cases  have  two  prior  exams  and  3  cases  have  three  prior  exams.  The 
145  true  location  of  each  mass  was  identified  independently  on  each  mammographic  view  by  an 

experienced  MQSA-approved  radiologist.  The  masses  on  the  current  mammograms  are  referred 
to  as  “average”  and  the  masses  on  prior  exams  are  referred  to  as  “subtle”  because  many  of  those 
may  not  show  a  well-perceived  mass  even  on  retrospective  review.  The  normal  data  set 
contained  800  mammograms  from  200  patients;  each  case  included  the  CC  view  and  MLO 
150  view  of  both  breasts.  The  nonnal  data  set  was  only  used  for  estimating  the  FP  rate  during 
testing.  Figures  1  and  2  show  the  histograms  of  mass  size  and  visibility,  respectively,  for  the 
mass  set. 

2.2  Methods 

Figure  3  shows  a  schematic  of  our  dual  CAD  system  with  two-view  analysis.  The  two- 
155  view  dual  system  approach  is  described  in  detail  below. 

A.  Dual  CAD  System  Approach 
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An  important  purpose  of  a  CAD  system  is  to  serve  as  a  second  reader  to  alert  radiologists 
to  subtle  cancers  that  may  be  overlooked.  Since  the  lesions  identified  on  prior  mammograms 
upon  retrospective  review  represent  difficult  cases  that  are  more  likely  to  be  overlooked  by 
160  radiologists  if  similar  lesions  occur  on  screening  mammograms,  it  is  important  to  improve  the 
sensitivity  of  the  CAD  system  in  detecting  these  lesions.  On  the  other  hand,  when  a  CAD 
system  is  applied  to  a  new  mammogram  in  clinical  practice,  it  has  to  detect  breast  lesions  of  all 
degrees  of  subtlety  effectively.  However,  it  is  difficult  to  train  a  single  CAD  system  to  provide 
optimal  detection  for  all  lesions  over  the  entire  spectrum  of  subtlety  because  the  classifiers  have 
165  to  make  compromises  to  accommodate  lesions  of  a  wide  range  of  characteristics. 

We  have  developed  a  dual  system  approach  and  demonstrated  that  it  could  improve  the 
overall  perfonnance  of  our  CAD  system  13.  Briefly,  the  dual  system  is  composed  of  two  single 
CAD  systems  in  parallel.  The  two  systems  have  the  same  architecture  that  includes  four 
processing  steps:  (1)  pre-screening  of  mass  candidates,  (2)  segmentation  of  suspicious  objects, 
170  (3)  feature  extraction  and  analysis,  and  (4)  FP  reduction  by  classification  of  nonnal  tissue 

structures  and  masses.  They  were  optimized  separately  by  using  two  different  training  sets,  one 
contained  current  mammograms  with  “average”  masses  and  the  other  prior  mammograms  with 
“subtle”  masses.  The  two  data  sets  did  not  need  to  come  from  the  same  subjects.  After  the  two 
single  systems  were  trained  separately,  they  were  trained  together  with  a  single  training  set  for 
175  the  dual  system  information  fusion  step  using  an  artificial  neural  network.  For  an  input  unknown 

mammogram,  the  two  systems  are  applied  in  parallel  and  each  system  estimates  a  mass 
likelihood  score  for  every  detected  object,  the  trained  artificial  neural  network  merges  the  mass 
likelihood  scores  of  the  two  single  CAD  systems  for  a  given  object  to  differentiate  true  masses 
from  FPs.  The  details  can  be  found  in  the  literature13. 
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180  The  single -view  dual  system,  described  above,  constitutes  the  first  stage  of  the  new  two- 

view  dual  system  in  the  current  study.  To  perform  the  two-view  analysis,  a  threshold  was  chosen 
to  retain  a  small  number  of  the  most  suspicious  objects  per  mammographic  view  as  input  mass 
candidates  to  the  two-view  fusion  stage,  described  next. 

B.  Two-view  Information  Fusion 

185  The  mass  candidates  on  one  view  will  be  paired  with  mass  candidates  on  the  other  view 

based  on  a  regional  registration  method  using  geometric  criteria.  The  paired  objects  will 
undergo  two-view  similarity  analysis  to  differentiate  TP  and  FP  pairs.  The  two-view  analysis  is 
based  on  two  assumptions:  (1)  the  likelihood  of  detecting  a  true  mass  on  both  views  is  higher 
than  that  of  detecting  the  same  FPs  on  both  views,  and  (2)  the  corresponding  true  masses  (TP-TP 
190  pair)  on  two  different  mammographic  views  will  exhibit  higher  similarity  than  that  of  FP  pairs 
(TP-FP  pairs  and  FP-FP  pairs)  in  terms  of  morphological  features,  texture  features,  and  cross 
correlation. 

The  key  process  of  our  two-view  CAD  system  is  the  information  fusion  in  which  the 
suspicious  objects  on  different  mammographic  views  are  paired  together  and  a  unique  fusion 
195  score  is  generated  for  each  individual  object.  Our  two-view  information  fusion  scheme  consists 

of  four  steps:  (1)  regional  registration  by  using  geometric  information,  (2)  estimation  of  image 
similarity  measure  between  paired  objects  using  cross  correlation,  (3)  estimation  of  feature 
similarity  measure  by  designing  a  classifier  for  differentiation  of  TP-TP  pairs  from  other  pairs, 
and  (4)  generation  of  two-view  fusion  score.  Figure  3  shows  the  block  diagram  of  the  two-view 
200  information  fusion  process  for  suspicious  objects  on  the  CC  and  MLO  views  of  the  same  breast. 

Each  step  is  described  below  in  detail. 

B.l  Regional  registration 
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Because  of  the  compression  of  the  highly  defonnable  breast  and  the  lack  of  invariant 
landmarks  in  most  cases,  it  is  virtually  impossible  to  pinpoint  the  corresponding  locations  on 
205  different  views.  We  previously  developed  a  regional  registration  method  for  locating  the 

approximate  locations  of  corresponding  objects  on  mammograms  acquired  at  different  views  4. 
From  the  geometry  of  the  mammographic  image  acquisition,  it  is  known  that  an  object  seen  on 
the  CC  view  can  appear  only  in  a  limited  region  in  the  MLO  view,  and  vice  versa.  Radiologists 
at  our  institution  routinely  use  the  nipple-to-object  distance  (NOD)  to  estimate  the 
210  correspondence  between  objects  seen  on  different  views  of  the  same  breast.  We  emulate  the 

radiologists’  technique  and  use  the  NOD  as  the  geometric  matching  criterion  for  initial 
registration  of  potential  pairs. 

The  regional  registration  is  performed  in  a  polar  coordinate  system  the  origin  of  which  is 
located  at  the  nipple  location.  Figure  4  illustrates  the  process  of  our  regional  registration  method 
215  for  a  suspicious  object  on  CC  view.  Using  the  distance  NOD=Rc  from  the  nipple  Nc  to  the 

center  Oci  of  the  object  on  CC  view,  an  annular  region  that  is  bounded  by  two  arcs  of  radii 
Rc±AR  is  defined  on  MLO  view  with  the  nipple  Nm  as  the  center.  The  radial  width  of  the 
annular  region  2 AR  was  estimated  with  a  large  data  set  to  be  ±3  cm  in  our  previous  study  5.  Any 
suspicious  objects  on  MLO  view  that  fall  within  the  annular  region  is  paired  with  the  object  Oci 
220  on  the  CC  view.  In  this  example,  Omi  and  Omi  are  paired  with  Oci.  After  the  regional 
registration  process  is  performed  for  all  suspicious  objects  detected  on  the  CC  view,  a  number  of 
object  pairs  that  include  true  mass  pairs  (TP-TP  pairs)  and  false  pairs  (FP-TP,  TP-FP  and  FP-FP 
pairs)  are  generated. 

We  developed  an  automated  nipple  detection  method  previously15  but  it  did  not  detect  the 
225  nipple  location  correctly  in  all  mammograms.  To  evaluate  the  feasibility  of  the  two  view 
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analysis  method  independent  of  the  nipple  detection  errors,  we  used  manually  identified  nipple 
locations  in  this  study. 

B.2  Cross  correlation  measure 

In  this  step,  a  template  matching  approach  is  used  to  measure  the  similarity  of  the  two 
230  objects  in  order  to  distinguish  the  truly  matched  object  pairs  from  the  incorrect  object  pairs. 

Cross  correlation  is  a  popular  template  matching  method.  A  previous  study  from  our  laboratory 
found  that  cross  correlation  was  superior  to  11  other  similarity  measures  for  matching 
corresponding  masses  on  serial  mammograms  1S.  In  this  study,  we  therefore  use  cross  correlation 
as  the  similarity  measure  to  match  the  same  mass  appearing  on  different  views.  Assume  that  a 
235  mass  candidate  on  the  CC  view  has  been  paired  with  several  detected  objects  in  the  annular 

region  on  the  MLO  view.  For  a  given  object  pair,  the  suspicious  regions  on  CC  and  MLO  views 
are  denoted  as  Ic  and  Im  ,  respectively,  where  the  region  Ic  is  a  box  enclosing  the  mass 
candidate  detected  by  the  dual  CAD  system  on  the  CC  view  and  the  size  of  which  is  determined 
by  the  segmentation  of  the  object  on  this  view.  The  region  size  is  thus  varied  for  each  of  the 
240  candidate  object.  Because  the  detected  objects  may  not  be  centered  at  the  bounding  box,  a  2  mm 

x  2  mm  search  region  is  defined  with  its  center  at  the  central  location  of  the  paired  object  on  the 
MLO  view.  The  center  of  the  reference  region  Ic  is  placed  within  the  search  region  and  moved 
one  pixel  at  a  time  over  the  entire  search  region.  The  cross  correlation  (r)  between  Ic  and  Im  , 
where  Im  is  a  region  with  the  same  size  as  Ic  and  centered  at  each  location  on  the  MLO  view,  is 
245  calculated  as  shown  in  Eq.  (1): 
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250 


255 


where  I.  denotes  the  i'h  pixel  in  the  region  Ix  (x=c,  m) ,  n  is  the  number  of  pixels  in  the 
reference  object  region  Ic ,  and 


(2) 

n  f=i 

The  cross  correlation  measure  is  defined  as  the  maximum  r  value  among  all  locations  within  the 
search  region. 

B.3  Two-view  similarity  classification 

We  assumed  that  the  features  of  the  same  mass  on  different  views  will  show  more  similar 
properties  than  those  of  false  pairs  so  that  true  mass  pairs  (TP-TP  pairs)  can  be  distinguished 
from  false  pairs  by  performing  feature  classification  in  the  combined  space  of  similarity  features. 


Three  groups  of  features,  morphological  features,  Hessian  features  and  texture  features 
are  extracted  from  each  object.  Similarity  features  are  derived  as  the  absolute  difference  and  the 
mean  of  the  corresponding  features  of  each  object  pair.  These  similarity  features,  in  combination 
260  with  the  geometric  similarity,  i.e.,  the  difference  in  NOD  between  the  paired  objects,  formed  the 
feature  space  for  classification  of  true  pairs  from  false  pairs.  An  LDA  classifier  was  trained  to 
estimate  a  two-view  similarity  score  for  each  object  pair  as  detailed  in  Section  D  below. 

A  total  of  13  morphological  features  are  extracted  as  the  descriptors  of  the  segmented 
mass  shape.  The  morphological  feature  descriptors  include  the  area  in  terms  of  the  number  of 
265  pixels  in  the  object,  circularity,  contrast,  convexity,  Fourier  descriptor,  nonnalized  radial  length 

(NRL)  mean,  NRL  area  ratio,  NRL  entropy,  NRL  standard  deviation,  NRL  zero  crossing  count, 
perimeter,  perimeter-to-area  ratio  and  rectangularity.  The  detailed  definitions  were  described  in 
our  previous  study17. 
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Hessian  features  are  derived  from  the  eigenvalues  of  Hessian  matrices  in  the  region  of 
interest  (ROI)  containing  a  suspicious  object  in  order  to  distinguish  circular  objects  from  other 
objects.  The  Hessian  matrix  for  a  2D  image  f(x,y)  is  defined  as 


Hf  = 


f xx  f xy 
f yx  f yy 


(4) 


where  fxx  =  z~2f  >  f xy  =  fyx  =  >  and  fyy  =  f  •  To  enhance  local  structures  of 

variable  sizes  and  also  reduce  the  noise,  fix.v)  is  convolved  with  multiscale  Gaussian  filters 
having  a  range  of  standard  deviations  ( Ss  =  4mm  to  10mm)  before  calculating  the  Hessian 

matrices.  We  designed  a  response  function  for  mass  enhancement  at  a  location  (x,y)  and  a  given 
scale  as 


R(f(x,y),8s) 


0, 


if  <  0 

otherwise 


(5) 


where  \  and  A2  are  the  eigenvalues  of  Eq.  (4)  with  \Xx  \  >  \  A2  \  at  the  scale  with  Gaussian  filter 
280  Ss .  The  Hessian  feature  at  a  location  (x,y)  is  defined  as  the  maximum  value  of  the  response  at 

that  location  among  all  scales.  Three  Hessian  features,  the  Hessian  feature  at  the  center  location 
of  the  ROI  (HI),  the  maximum  Hessian  feature  within  the  ROI  (H2),  and  the  difference  between 
HI  and  H2,  are  calculated  for  each  object. 

The  texture  features  are  described  by  the  run  length  statistics  (RLS)  as  follows.  The 
285  rubber-band  straightening  transform  (RBST)  is  applied  to  each  object.  A  band  of  60-pixel-wide 

region  around  the  object  margin  is  transformed  to  a  rectangular  image.  A  gradient  magnitude 
image  of  the  transformed  rectangular  object  margin  is  derived  from  Sobel  filtering.  Five  RLS 
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texture  features  -  short  runs  emphasis,  long  runs  emphasis,  gray  level  nonunifonnity,  run  length 
nonuniformity  and  run  percentage  -  are  extracted  from  the  gradient  image  in  both  the  horizontal 
290  and  vertical  directions,  resulting  in  a  total  of  10  RLS  texture  features.  Detailed  definition  of  the 

RBST  and  the  RLS  texture  features  for  mammographic  masses  can  be  found  in  the  literature  18. 

B. 4  Generation  of  two-view  fusion  score 

Since  the  correspondence  of  the  location  of  an  object  projected  on  different  views  cannot 
be  detennined  accurately,  several  situations  will  occur.  An  object  on  one  view  may  pair  with  a 
295  single  object,  with  multiple  objects,  or  with  no  object,  depending  on  the  number  of  objects  within 

the  annular  region  on  the  second  view  defined  for  the  given  object.  Each  object  pair  will  obtain 
a  similarity  score  after  the  LDA  classification.  We  have  designed  a  fusion  method  to  assign  a 
unique  score  for  the  suspicious  object  on  the  first  view  from  the  similarity  analysis.  The 
similarity  LDA  score  of  the  object  pair  is  first  weighted  by  (i.e.,  multiplied  with)  the  cross 
300  correlation  measure  of  the  pair.  The  weighted  LDA  score  is  then  used  as  the  fusion  score  for  the 

object  if  there  is  only  a  single  object  pair.  For  an  object  that  was  paired  with  multiple  objects, 
the  maximum  weighted  LDA  score  among  all  object  pairs  is  chosen  as  the  fusion  score  for  the 
object.  For  an  object  without  object  pairs,  the  fusion  score  is  set  to  be  -2.0  as  penalty.  The  value 
of  -2  was  chosen  because  it  was  slightly  smaller  than  the  minimum  fusion  score  obtained  in  the 
305  training  set. 

C.  Two-view  system  classifier 

During  this  final  stage,  we  have  designed  a  third  LDA  classifier  with  two  input  features, 
the  mass  likelihood  score  from  the  single-view  dual  system  detection  stage  and  the  fusion  score 
from  the  two-view  analysis,  to  distinguish  the  mass  from  normal  tissue  on  each  view.  The  same 
310  two-view  fusion  process  is  applied  to  the  mass  candidates  on  each  view  so  that  each  view  will 
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have  a  set  of  detected  objects  with  individual  scores  at  the  output  of  this  two-view  system  LDA 
classifier.  The  classifier  training  and  testing  processes  are  described  below. 

D.  Training  and  testing 

To  train  and  test  the  proposed  computerized  methods,  we  randomly  separated  the  mass 
315  data  sets  by  case  into  two  approximately  equal-size  independent  subsets.  Two-fold  cross 

validation  was  used  for  training  and  testing  the  algorithms.  In  each  cross-validation  cycle,  we 
used  the  training  subset  for  that  cycle  to  select  the  optimal  feature  set  and  train  the  parameters  of 
the  classifiers  for  the  single-view  dual  system,  the  two-view  similarity  analysis,  and  the  two- 
view  dual  system.  For  each  classifier,  the  classification  accuracy  for  the  training  subset  was 
320  optimized  in  terms  of  the  area  under  the  ROC  curve,  Az.  The  single-system  LDA  classifiers 

would  be  trained  to  combine  the  multi-dimensional  features  into  the  mass  likelihood  score  for 
each  object  from  the  single-view  system  detection  stage,  and  a  neural  network  classifier  was 
trained  to  merge  the  single  system  scores  into  a  dual  system  score.  The  two-view  fusion  LDA 
classifier  would  be  trained  to  combine  the  multi-dimensional  similarity  features  into  a  similarity 
325  measure  for  the  paired  objects.  The  two-view  dual  system  LDA  classifier  would  be  trained  to 

differentiate  TPs  from  FPs. 

The  LDA  classifiers  for  the  single-systems  and  the  two-view  similarity  analysis  were 
trained  with  feature  selection.  Our  procedures  for  feature  selection  and  classifier  design  have 
been  described  in  detail  elsewhere1119,20.  Briefly,  feature  selection  with  stepwise  LDA21  and 
330  simplex  optimization  were  used  to  select  the  best  feature  subset  and  reduce  the  dimensionality  of 

the  feature  space.  The  best  combination  of  the  stepwise  feature  selection  parameters,  including 
the  threshold  values  for  feature  entry,  feature  removal,  and  tolerance  of  feature  correlation,  was 
first  chosen  by  using  a  leave-one-case-out  resampling  method  and  a  simplex  optimization 
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procedure  within  the  training  subset.  The  Az  from  the  leave-one-case-out  testing  was  used  as  the 
335  figure-of-merit  to  guide  the  search  for  the  maximum  in  the  parameter  space.  Using  the  best  set  of 

parameters  and  the  training  subset  alone,  a  final  stepwise  feature  selection  was  then  performed  to 
select  a  set  of  features  and  the  weights  of  the  LDA  were  estimated. 

Once  the  training  with  one  mass  subset  was  completed,  the  parameters  were  fixed  and 
applied  to  the  cross-validation  test  subset.  The  entire  training  and  testing  processes  were 
340  repeated  for  the  other  cross-validation  cycle  in  which  the  training  and  test  subsets  were  switched. 

The  set  of  nonnal  mammograms  was  not  used  during  training.  The  trained  system  from  each 
cycle  was  applied  to  the  normal  set  to  estimate  its  FP  rate  in  screening  mammograms. 

E.  Performance  analysis 

The  detection  perfonnance  of  the  two-view  dual  CAD  system  was  assessed  by  free 
345  response  ROC  (FROC)  analysis.  An  FROC  curve  was  obtained  by  plotting  the  mass  detection 

sensitivity  as  a  function  of  FP  marks  per  image  at  the  corresponding  decision  threshold.  The 
mass  detection  sensitivity  was  determined  by  the  detected  masses  on  the  test  mass  subset 
whereas  the  number  of  FP  marks  produced  by  the  CAD  system  was  determined  by  the  detected 
objects  on  the  normal  cases  only.  FROC  curves  were  presented  on  a  per-mammogram  and  a  per- 
350  case  basis.  For  image-based  FROC  analysis,  the  mass  on  each  mammogram  was  considered  an 

independent  true  object.  For  case-based  FROC  analysis,  the  same  mass  imaged  on  the  two-view 
mammograms  was  considered  to  be  one  true  object  and  detection  of  either  mass  or  both  masses 
on  the  two  views  was  considered  to  be  a  TP  detection.  Since  we  used  two-fold  cross  validation 
method  for  training  and  testing,  we  obtained  two  test  FROC  curves,  one  for  each  test  subset,  for 
355  each  of  the  conditions  (e.g.,  single-view  approach  or  two-view  approach).  In  order  to  compare 

the  performance  of  the  single-view  and  the  two-view  CAD  systems,  we  applied  the  jackknife 
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free-response  ROC  (JAFROC)  method  developed  by  Chakraborty  et  al.22  to  each  pair  of  the 
image-based  FROC  curves  obtained  with  the  two  systems  for  the  same  test  subset.  To  summarize 
the  results  for  comparison,  an  average  test  FROC  curve  was  derived  by  averaging  the  FP  rates  at 
360  the  same  sensitivity  along  the  FROC  curves  of  the  two  test  subsets  for  each  condition. 

III.  RESULTS 

3.1  Single- view  dual  CAD  system 

During  the  first  step  of  our  two-view  analysis,  our  previously  developed  dual  CAD 
365  system  13  was  used  as  the  single-view  system  to  detect  mass  candidates  as  input  to  the  later 

stages.  We  experimentally  chose  a  criterion  of  using  a  maximum  of  5  most  suspicious  mass 
candidates  per  image  from  the  single-view  detection  stage  which  is  a  compromise  between  high 
sensitivity  to  retain  masses  on  both  views  to  be  paired  and  the  FP  rate  not  being  excessively  high. 
With  this  criterion,  the  image-based  and  case-based  sensitivities  on  the  current  mass  set  were 
370  88.6%  and  95.4%,  respectively,  while  the  corresponding  sensitivities  for  the  prior  mass  set  were 

71.3%  and  80.7%,  respectively. 

3.2  Regional  registration 

In  this  study,  we  used  the  NOD  to  register  the  mass  candidates  identified  by  the  single 
view  CAD  system.  Figure  5  showed  the  histogram  of  the  NOD  difference  for  the  same  mass 
375  which  were  identified  by  radiologists  on  different  mammographic  views.  In  our  mass  set,  there 

were  a  total  of  475  average  masses  on  current  mammograms  and  107  subtle  masses  on  prior 
mammograms  which  could  be  seen  on  both  views.  We  used  30  mm  as  the  upper  bound  to  match 
the  object  pairs  from  the  same  breast  and  thus  the  annular  region  was  chosen  to  have  a  radial 
width  of  ±30  mm.  Under  this  condition,  9  out  of  475  average  masses  and  1  out  of  107  subtle 
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380  masses  were  not  able  to  be  paired  correctly.  During  the  regional  registration  process,  there  were 
a  total  of  8271  object  pairs  from  the  two  mass  subsets  which  generated  an  average  of  10.8  object 
pairs  in  the  CC  and  MLO  views  of  a  breast  and  4152  object  pairs  from  the  nonnal  data  set  with 
an  average  of  10.4  object  pairs  in  the  two  views  of  a  breast.  After  the  regional  registration,  we 
were  able  to  match  only  86.3%  (410  of  475)  of  the  mass  pairs  on  current  mammograms  and 
385  57.9%  (62  of  107)  of  the  mass  pairs  on  prior  mammograms.  Of  the  average  masses,  1 1.8%  (56 

of  475)  of  the  misses  were  caused  by  either  one  or  both  of  the  masses  being  missed  by  the  dual 
CAD  system,  and  only  1.9%  (9  of  475)  of  the  average  masses  could  not  be  matched  because  the 
difference  in  the  NODs  was  larger  than  30  mm.  For  the  subtle  masses  on  prior  mammograms, 
the  corresponding  missed  rates  were  41.1%  (44  of  107)  and  0.9%  (1  of  107),  respectively. 

390  3.3  Two-view  similarity  classification 

For  two-view  similarity  classification,  the  number  of  the  selected  features  from  the  two 
mass  subsets  was  6  (Difference  of  NOD,  average  of  segmented  area,  average  of  Hessian  output 
and  three  average  RLS  texture  features)  and  7  (Difference  of  NOD,  average  of  segmented  area, 
average  of  Hessian  output,  difference  of  NRL  entropy  and  three  average  RLS  texture  features), 
395  respectively.  Figure  6  shows  the  test  ROC  curves  of  the  two-view  similarity  classifier  on  mass 

subsets  obtained  from  cross-validation  testing  with  Az  values  of  0.87±0.01  and  0.88±0.01, 
respectively. 

3.4  Detection  performance  comparison 

The  test  FROC  curves  for  average  masses  on  current  mammograms  are  compared  in 
400  Figure  7.  The  figures-of-merit  (FOM)  and  the  p-values  of  the  difference  between  pairs  of  image- 
based  FROC  curves  under  different  conditions  estimated  by  JAFROC  analysis  are  tabulated  in 
Table  1.  Because  of  the  multiple  comparisons,  the  p-value  to  achieve  statistical  significance  may 
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be  reduced  to  0.002  (=0.05/24)  using  the  conservative  Bonferroni  correction23,24.  All  paired 
comparisons  achieved  statistical  significance  (p<0.002).  When  the  single  CAD  system  was 
405  applied  to  the  test  sets,  the  average  case-based  sensitivities  were  50.6%  and  63.6%  at  0.5  and  1.0 
FPs/image,  respectively,  for  the  average  masses  on  current  mammograms.  When  the  dual  CAD 
system  was  applied  to  the  test  sets,  the  average  case-based  sensitivities  were  improved  to  62.1% 
and  80.1%,  respectively,  at  the  same  FP  rates  for  the  average  masses.  With  the  proposed  two- 
view  dual  system,  the  average  case-based  sensitivities  were  further  improved  to  67.4%  and 
410  83.7%,  respectively,  at  the  same  FP  rates. 

The  improvement  with  the  proposed  approach  was  also  analyzed  for  the  subtle  masses  on 
prior  mammograms  (Figure  8).  The  FOMs  and  the  p-values  of  the  difference  between  pairs  of 
image-based  FROC  curves  under  different  conditions  estimated  by  JAFROC  analysis  for  subtle 
masses  are  tabulated  in  Table  2.  The  dual  system  and  the  two-view  dual  system  have 
415  significantly  higher  (p<0.002)  detection  performances  than  the  single  system,  whereas  the 

difference  between  the  dual  system  and  the  two-view  dual  system  did  not  achieved  statistical 
significance  (p>0.002).  When  the  single  CAD  system  was  applied  to  the  test  subsets,  the 
average  case-based  sensitivities  were  22.6%  and  36.2%  at  0.5  and  1.0  FPs/image,  respectively, 
for  the  subtle  masses  on  prior  mammograms.  When  the  dual  CAD  system  was  applied  to  the  test 
420  subsets,  the  average  case-based  sensitivities  were  improved  to  41.5%  and  55.5%,  respectively,  at 

the  same  FP  rates.  With  the  proposed  two-view  dual  system,  the  average  case-based  sensitivities 
for  subtle  masses  were  further  improved  to  44.8%  and  57.0%,  respectively,  at  the  same  FP  rates. 


425 


19 


Table  1 .  Estimation  of  the  statistical  significance  of  the  difference  between  the  FROC  curves 
for  three  approaches:  the  single  CAD  system,  the  dual  system,  and  the  two-view  dual 
430  system.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data 

set  were  compared.  The  pairs  of  image-based  FROC  curves  were  compared  with 
JAFROC  methodology.  The  figure-of-merit  from  JAFROC  analysis  for  each  curve  is 
shown. 


JAFROC 

Analysis 

FOM  (Average  Masses) 

All  Cases 

Malignant  Cases 

Test 
subset  1 

Test 
subset  2 

Test 
subset  1 

Test 
subset  2 

Single  system 

0.63 

0.63 

0.58 

0.60 

Dual  system 

0.69 

0.69 

0.68 

0.69 

P  values 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

Dual  system 

0.69 

0.69 

0.68 

0.69 

Two-view 
Dual  system 

0.73 

0.72 

0.74 

0.74 

P  values 

0.0003 

0.001 

<0.0001 

0.0004 

Single  system 

0.63 

0.63 

0.58 

0.60 

Two-view 
Dual  system 

0.73 

0.72 

0.74 

0.74 

P  values 

<0.0001 

<0.0001 

<0.0001 

<0.0001 
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Table  2.  Estimation  of  the  statistical  significance  of  the  difference  between  the  FROC  curves 
for  three  approaches:  the  single  CAD  system,  the  dual  system,  and  the  two-view  dual 
system.  The  FROC  curves  with  the  FP  marker  rates  obtained  from  the  normal  data 
set  were  compared.  The  pairs  of  image-based  FROC  curves  were  compared  with 
440  JAFROC  methodology.  The  figure-of-merit  from  JAFROC  analysis  for  each  curve  is 

shown. 


JAFROC 

Analysis 

FOM  (Subtle  Masses) 

All  Cases 

Malignant  Cases 

Test 
subset  1 

Test 
subset  2 

Test 
subset  1 

Test 
subset  2 

Single  system 

0.42 

0.39 

0.37 

0.32 

Dual  system 

0.48 

0.46 

0.48 

0.45 

P  values 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

Dual  system 

0.48 

0.46 

0.48 

0.45 

Two-view 
Dual  system 

0.52 

0.49 

0.52 

0.48 

P  values 

0.111 

0.078 

0.305 

0.219 

Single  system 

0.42 

0.39 

0.37 

0.32 

Two-view 
Dual  system 

0.52 

0.49 

0.52 

0.48 

P  values 

<0.0001 

<0.0001 

<0.0001 

<0.0001 

445 


450 
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IV.  DISCUSSION  AND  CONCLUSION 


We  have  been  developing  CAD  methods  for  mass  detection  on  mammograms.  We 
previously  designed  a  dual  system  approach  to  improve  the  overall  performance  for  mass 
detection13.  We  also  conducted  a  feasibility  study  of  a  new  two-view  analysis  method  14.  In  this 
455  study,  we  combined  these  two  new  approaches  into  a  two-view  dual  CAD  system  to  further 

improve  its  detection  accuracy  and  evaluated  its  performance  in  a  relatively  large  data  set.  Our 
results  indicated  that  the  proposed  system  could  significantly  improve  the  mass  detection 
accuracy  in  comparison  to  the  single  CAD  system  and  the  dual  CAD  system  for  average  masses, 
whereas  the  difference  in  the  perfonnances  between  the  two-view  dual  system  and  the  single- 
460  view  dual  system  did  not  achieve  statistical  significance  for  subtle  masses. 

The  improvement  achievable  with  the  two-view  fusion  analysis  depends  strongly  on  the 
sensitivity  of  the  single-view  detection  stage.  If  the  lesion  is  missed  in  the  single-view  detection, 
the  two-view  analysis  will  not  improve  the  sensitivity.  We  used  the  dual-system  analysis  as  the 
first  step  in  order  to  detect  as  many  masses  as  possible  (especially  for  subtle  masses)  on  single 
465  views.  Although  the  improvement  by  dual-system  analysis  was  substantial  in  comparison  with 

the  single  CAD  system,  110  masses  (65  of  475  average  masses  and  45  of  107  subtle  masses)  still 
could  not  be  matched  after  regional  registration.  The  improvement  that  was  achieved  by  the 
two-view  analysis  was  therefore  somewhat  limited,  especially  for  the  subtle  masses.  For  average 
masses  on  current  mammograms,  when  we  only  analyzed  the  masses  which  could  form  TP-TP 
470  pairs  during  regional  registration  (410  for  the  average  mass  set),  it  was  found  that  the  average 
case-based  sensitivities  reached  73.4%  and  85.7%  at  an  FP  rate  of  0.5  and  1.0  per  image, 
respectively,  with  the  two-view  dual  system.  Similarly,  for  the  subtle  mass  set,  the  average  case- 
based  sensitivities  reached  67.7%  and  80.6%  (62  for  the  subtle  mass  set)  at  the  same  FP  rates.  It 
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can  therefore  be  expected  that  the  improvement  by  two-view  analysis  will  be  greater  when  the 
475  single-view  detection  system  can  be  further  improved  in  the  future. 

It  may  be  noted  that  the  improvement  in  detection  sensitivity  obtained  by  two-view 
analysis  is  different  from  the  apparent  increase  by  case-based  FROC  analysis.  In  case-based 
FROC  analysis,  a  mass  is  considered  to  be  detected  if  it  is  detected  either  on  one  view  or  on  two 
views.  With  two-view  analysis,  there  is  a  true  improvement  in  the  detection  sensitivity,  as  can 
480  be  observed  from  the  comparison  of  the  image-based  FROC  curves.  If  an  additional  detected 

mass  is  in  the  other  view  of  a  breast  for  which  the  mass  is  already  counted  as  TP  in  the  case- 
based  FROC  curve  for  single-view  analysis,  this  additional  detection  will  not  contribute  to  an 
improvement  in  the  case-based  FROC  curve  for  two-view  analysis.  This  is  the  reason  that  the 
difference  between  the  two  case-based  FROC  curves  for  the  single -view  and  two-view  analysis 
485  is  smaller  than  that  observed  between  the  two  image-based  FROC  curves.  However,  we  could 

not  conduct  a  statistical  comparison  for  case-based  FROC  curves  due  to  the  fact  that  the  FPs 
from  the  two  views  might  not  be  independent  and  a  statistical  test  is  not  yet  available  under  this 
situation.  Case-based  performance  is  more  generally  reported  by  researchers  and  CAD  system 
manufacturers  so  that  it  is  more  often  used  for  comparing  the  detection  performance  between 
490  CAD  systems.  One  should  note  that  the  actual  image-based  detection  performance  of  two 

systems  with  similar  case-based  perfonnance  can  be  significantly  different.  For  clinical 
applications,  there  is  a  practical  advantage  to  increase  the  sensitivity  by  two-view  analysis 
because  radiologists  have  greater  confidence  in  a  lesion  being  a  TP  if  the  same  lesion  is  detected 
on  both  views  and  are  less  likely  to  ignore  the  CAD  mark.  Dismissing  correct  CAD  marks  has 
495  been  observed  to  be  a  major  cause  of  some  radiologists  not  gaining  the  benefit  of  using  CAD. 
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In  summary,  we  have  developed  a  two-view  dual  CAD  system  to  improve  computerized 
detection  of  breast  masses  on  mammograms.  Our  results  indicate  that  the  proposed  CAD  system 
significantly  improved  the  detection  perfonnance  as  estimated  by  the  JAFROC  analysis.  The 
improvement  by  two-view  analysis  is  strongly  related  to  the  performance  of  the  single-view 
500  detection  system.  The  perfonnance  of  the  two-view  dual  system  can  potentially  be  further 

improved  if  the  single-view  CAD  system  is  improved.  We  manually  identified  the  nipple 
locations  for  the  two-view  analysis  in  this  study.  We  will  continue  to  improve  the  accuracy  of 
our  automated  nipple  detection  method15  so  that  we  can  fully  automate  the  two-view  analysis  in 
the  future. 
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Figure  Captions 


Figure  1.  Distributions  of  the  mass  sizes  for  535  average  masses  identified  on  the 
current  mammograms  and  190  subtle  masses  identified  on  the  prior 
mammograms.  The  size  for  each  mass  is  measured  independently  as  the 
longest  diameter  on  each  mammographic  view  by  an  experienced  MQSA 
radiologist.  The  mean  sizes  are  15.0±7.7  mm  for  average  masses  and 
10.9±6.6  mm  for  subtle  masses,  respectively. 

Figure  2.  Histogram  of  the  mass  visibility  for  1070  average  masses  by  view  identified 
on  the  current  mammograms  and  462  subtle  masses  by  view  identified  on 
the  prior  mammograms.  The  visibility  is  evaluated  on  a  10-point  rating 
scale  with  1  representing  the  most  visible  masses  and  10  the  most  difficult 
case  relative  to  the  cases  seen  in  their  clinical  practice.  Each  mass  on  a 
mammogram  is  rated  independently  by  an  experienced  MQSA  radiologist. 
There  are  60  invisible  masses  on  current  mammograms  and  124  invisible 
masses  on  prior  mammograms. 

Figure  3.  Schematic  diagram  of  our  dual-system  two-view  approach  for  mass 
detection  on  mammograms.  The  system  is  developed  for  screening 
mammography  in  which  all  masses,  regardless  of  malignant  or  benign,  are 
considered  positive. 

Figure  4.  Illustration  of  the  process  of  our  regional  registration  method  for  locating 
potential  object  pairs  on  CC  and  MLO  views. 


Figure  5.  Distributions  of  the  nipple -to-object  distance  (NOD)  differences  for  the 


same  mass  on  different  mammographic  views  identified  by  radiologists. 


Figure  6.  The  test  ROC  curves  for  classification  of  TP-TP  pairs  from  other  pairs  on 
two  test  mass  subsets.  The  Az  values  for  the  two  mass  subsets  obtained 
from  cross-validation  testing  were  0.87±0.01  and  0.88±0.01,  respectively. 

Figure  7.  Comparison  of  the  average  test  FROC  curves  obtained  from  averaging  the 
FROC  curves  of  the  two  independent  subsets  for  average  masses  on  current 
mammograms.  The  FP  rate  was  estimated  from  nonnal  mammograms,  (a) 
Image-based  FROC  curves,  (b)  Case-based  FROC  curves. 

Figure  8.  Comparison  of  the  average  test  FROC  curves  obtained  from  averaging  the 
FROC  curves  of  the  two  independent  subsets  for  subtle  masses  on  prior 
mammograms.  The  FP  rate  was  estimated  from  nonnal  mammograms,  (a) 
Image-based  FROC  curves,  (b)  Case-based  FROC  curves. 
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potential  object  pairs  on  CC  and  MLO  views. 


Number  of  Masses 


Difference  of  NOD  (mm) 


Figure  5.  Distributions  of  the  nipple-to-object  distance  (NOD)  differences  for  the  same 
mass  on  different  mammographic  views  identified  by  radiologists. 
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Figure  6.  The  test  ROC  curves  for  classification  of  TP-TP  pairs  from  other  pairs  on  two 
test  mass  subsets.  The  Az  values  for  the  two  mass  subsets  obtained  from  cross- 
validation  testing  were  0.87±0.01  and  0.88±0.01,  respectively. 
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Figure  7.  Comparison  of  the  average  test  FROC  curves  obtained  from  averaging  the 
FROC  curves  of  the  two  independent  subsets  for  average  masses  on  current 
mammograms.  The  FP  rate  was  estimated  from  normal  mammograms,  (a) 
Image-based  FROC  curves,  (b)  Case-based  FROC  curves. 
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Figure  8.  Comparison  of  the  average  test  FROC  curves  obtained  from  averaging  the 
FROC  curves  of  the  two  independent  subsets  for  subtle  masses  on  prior 
mammograms.  The  FP  rate  was  estimated  from  normal  mammograms,  (a) 
Image-based  FROC  curves,  (b)  Case-based  FROC  curves. 
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ABSTRACT 

Several  full-field  digital  mammography  (FFDM)  systems  have  been  approved  for  clinical  applications.  It  is 
important  to  develop  a  CAD  system  that  can  easily  be  adapted  to  images  acquired  by  FFDM  systems  from  different 
manufacturers.  To  develop  a  CAD  system  that  is  independent  of  the  FFDM  manufacturer’s  proprietary  preprocessing 
methods,  we  used  the  raw  FFDM  image  as  input  and  developed  a  multi-resolution  preprocessing  scheme  for  image 
enhancement.  Our  CAD  system  performed  prescreening  to  identify  mass  candidates,  segmented  the  suspicious 
structures,  extracted  morphological  and  texture  features,  and  then  classified  masses  and  normal  tissue.  In  this  study, 
we  investigated  the  use  of  a  two-stage  gradient  field  analysis  to  identify  suspicious  masses,  and  the  effectiveness  of  a 
new  gradient  field  feature  extracted  from  each  suspicious  object  for  false  positive  (FP)  reduction.  A  data  set  of  104 
cases  with  243  images  acquired  with  a  GE  FFDM  system  was  collected.  Most  cases  had  two  mammographic  views, 
except  for  12  cases  that  had  three  views  and  1  case  with  only  one  view.  The  data  set  contained  106  masses.  The  true 
locations  of  the  masses  were  identified  by  an  experienced  radiologist.  Using  free-response  receiver  operating 
characteristic  (FROC)  analysis,  it  was  found  that  our  CAD  system  achieved  a  cased-based  sensitivity  of  70%,  80%,  and 
88%  at  0.8,  1.3,  and  1.7  FP  marks/image,  respectively.  The  high  performance  indicated  the  usefulness  of  the  new 
gradient  field  analysis  method. 

Keywords:  Computer-aided  diagnosis  (CAD),  Full  field  digital  mammography  (FFDM),  Gradient  field  analysis 


1.  INTRODUCTION 


Breast  cancer  is  one  of  the  leading  causes  of  death  among  American  women  between  40  to  55  years  of  age1  ~'4. 
It  has  been  reported  that  early  diagnosis  and  treatment  significantly  can  improve  the  chance  of  survival  for  patients  with 
breast  cancer3'6.  Although  mammography  is  the  best  available  screening  tool  for  detection  of  breast  cancers,  studies 
indicate  that  a  substantial  fraction  of  breast  cancers  that  are  visible  upon  retrospective  analyses  of  the  images  are  not 
detected  initially7'1".  Computer-aided  diagnosis  (CAD)  is  considered  to  be  one  of  the  promising  approaches  that  may 
improve  the  sensitivity  of  mammography  13'14.  Computer-aided  lesion  detection  can  be  used  during  screening  to 
reduce  oversight  of  suspicious  lesions  that  warrant  further  work-up.  It  has  been  shown  that  CAD  can  improve 
radiologists’  detection  accuracy  significantly  15'17. 


Most  of  mammographic  CAD  algorithms  developed  so  far  are  based  on  digitized  mammograms.  In  the  last 
few  years,  full-field  digital  mammography  (FFDM)  technology  has  advanced  rapidly  because  of  the  potential  of  digital 
imaging  to  improve  breast  cancer  detection.  Several  FFDM  systems  have  become  commercially  available.  We  have 
developed  a  CAD  system  for  the  detection  of  masses  on  digitized  mammograms  in  our  previous  study18,19.  We  are 
developing  a  mass  detection  system  for  mammograms  acquired  directly  by  an  FFDM  system.  In  this  study,  we  are 
investigating  the  use  of  gradient  field  analysis  to  improve  the  performance  of  our  mass  detection  system  for  FFDMs. 
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2.  MATERIALS  AND  METHODS 


2.1  Materials 

The  data  set  we  used  in  this  study  contained  104  cases  with  243  images.  All  the  data  were  collected  with 
institutional  review  board  (IRB)  approval.  The  raw  mammograms  in  this  data  set  were  acquired  with  a  GE  FFDM 
system  at  a  pixel  size  of  1  OOuin  x  1  OOuin  and  14  bits  per  pixel.  Most  of  the  cases  had  two  mammographic  views,  the 
craniocaudal  (CC)  view  and  the  mediolateral  oblique  (MLO)  view  or  the  lateral  view,  except  for  12  cases  that  had  three 
views  and  1  case  with  only  one  view.  The  total  number  of  the  masses  in  this  data  set  is  106,  of  which  104  were 
biopsy-proven  and  2  were  followed  up.  The  true  locations  of  the  masses  were  identified  by  an  experienced  breast 
radiologist. 

2.2  Methods 

Our  CAD  system  consists  of  five  processing  steps:  1)  preprocessing  by  using  multi-scale  enhancement,  2)  pre¬ 
screening  of  mass  candidates,  3)  identification  of  suspicious  objects,  4)  extraction  of  feature  parameters,  and  5) 
classification  between  the  normal  and  the  abnormal  regions  by  using  rule-base  and  linear  discrimination  analysis  (FDA) 
classifiers.  The  block  diagram  for  the  scheme  is  shown  in  Figure  1. 

FFDMs  generally  are  pre-processed  with  proprietary  methods  before  being  displayed  to  readers.  The  image 
pre-processing  method  used  depends  on  the  manufacturer  of  the  FFDM  system.  In  an  effort  to  develop  a  CAD  system 
that  is  less  dependent  on  specific  FFDM  systems,  the  raw  digital  images  are  used  as  input  to  our  system.  A 
preprocessing  scheme  based  on  a  multi-resolution  method20  has  been  developed  for  image  enhancement.  This  scheme 
consists  of  three  steps.  First,  the  boundary  of  the  breast  is  detected  automatically  by  using  Otsu’s  method21.  Second, 
the  Faplacian  pyramid  is  used  to  decompose  the  image  into  multi-scales.  A  nonlinear  weight  function  is  designed  to 
enhance  each  high-pass  component.  Finally,  the  Gaussian  pyramid  is  used  to  reconstruct  the  multi-scales.  The  block 
diagram  for  the  scheme  is  shown  in  Figure  2.  An  example  of  an  original  mammogram  and  the  enhanced  mammogram 
are  shown  in  Figs.  3(a)  and  3(b),  respectively. 

In  our  previous  CAD  system  developed  on  digitized  screen-film  mammograms  (SFM),  an  adaptive  density- 
weighted  contrast  enhancement  (DWCE)  filter18  was  developed  for  prescreening.  Although  the  DWCE  filter  using  the 
gray  level  information  can  identify  the  suspicious  location  of  masses  on  mammograms  with  high  sensitivity,  the 
prescreening  objects  often  include  a  large  number  of  enhanced  normal  breast  structures.  In  this  study,  we  investigate 
the  use  of  a  new  method  that  combines  gradient  field  information  and  gray  level  information  to  detect  the  mass 
candidates  on  the  FFDMs.  Gradient  field  information  is  commonly  used  in  computer  vision  or  other  fields  to  extract 
objects  or  intensity  field  distributions.  Kobatake  et  al22  designed  a  filter,  referred  to  as  an  iris  filter,  to  calculate  the 
convergence  of  gradient  index  around  each  pixel  on  SFMs  which  provided  shape  information  for  detection  of  masses. 
An  extension  of  the  iris  filter,  referred  to  as  an  adaptive  ring  filter,  was  developed  by  Wei  et  al23  for  detection  of  lung 
nodules  on  chest  x-ray  images.  In  this  study,  we  have  developed  a  two-stage  gradient  field  analysis  method  which 
does  not  only  use  the  shape  information  of  masses  on  mammograms  (an  extension  of  the  adaptive  ring  filter)  but  also 
incorporates  the  gray  level  information  by  using  a  region  growing  technique  in  the  second  stage  to  refine  the  gradient 
field  analysis. 

After  prescreening,  the  suspicious  objects  are  identified  by  using  a  clustering  based  region  growing  method. 
Figures  3(c)  and  3(d)  show  the  initial  detection  locations  and  the  grown  objects,  respectively.  For  each  suspicious 
object,  eleven  morphologic  features  are  extracted  and  rule-based  and  linear  classifiers  are  trained  to  remove  the  detected 
normal  structures  that  are  substantially  different  from  breast  masses.  Global  and  local  multiresolution  texture 
analysis24,25  are  performed  in  each  region  of  interest  by  using  the  spatial  gray  level  dependence  matrix.  A  new  gradient 
field  feature  is  extracted  from  each  suspicious  object  and  added  to  the  feature  space  for  false  positive  (FP)  reduction. 
Finally,  FDA  classification  is  used  to  identify  potential  breast  masses.  Figure  3(e)  shows  the  final  detected  objects, 
and  Figure  3(f)  shows  the  locations  of  these  objects  superimposed  on  the  mammogram,  respectively.  Further  details  of 
this  algorithm  can  be  found  in  the  literature19. 
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3.  RESULTS 


We  randomly  separated  the  cases  in  our  data  set  into  two  independent  equal  sized  data  sets:  the  training  data 
set  contained  52  cases  with  120  images  and  the  test  data  set  contained  52  cases  with  123  images,  respectively.  Both 
the  mass  detection  system  with  DWCE  filtering  and  that  with  the  two-stage  gradient  field  analysis  were  trained  with  the 
training  set,  the  performance  of  the  two  trained  systems  were  compared  using  the  test  data  set.  Our  CAD  system  with 
the  DWCE  filter  for  prescreening  of  mass  candidates  achieved  a  case-based  sensitivity  of  70%  and  80%  at  1.4  and  1.7 
FP  marks/image,  respectively.  When  the  DWCE  filter  was  replaced  by  the  gradient  field  analysis  for  prescreening,  the 
FP  marks/image  was  reduced  to  1.0  and  1.4  at  the  sensitivity  of  70%  and  80%,  respectively.  After  the  addition  of  the 
gradient  field  feature,  the  FP  was  further  reduced  to  0.8  and  1.3  FP  marks/image,  respectively,  at  these  sensitivities. 
Alternatively,  the  new  method  can  achieve  a  case-based  sensitivity  of  88%  at  1.7  FP  marks/image.  Figures  4  and  5 
show  the  comparison  of  performance  by  using  image-based  FROC  and  case-based  FROC  curves,  respectively. 


4.  DISCUSSION  AND  CONCLUSIONS 

Several  FFDM  systems  have  been  approved  for  clinical  applications.  It  is  important  to  develop  a  CAD 
system  that  can  easily  be  adapted  to  images  acquired  by  FFDM  systems  from  different  manufacturers.  In  this  work, 
we  developed  a  CAD  system  that  uses  the  raw  FFDMs  as  the  input.  Our  previous  CAD  system  which  was  developed 
on  digitized  mammograms  was  adapted  to  FFDMs  by  using  a  new  prescreening  method  that  employed  gradient  field 
analysis  and  by  retraining  the  processing  parameters.  A  gradient  field  feature  was  extracted  for  further  false  positive 
reduction.  The  gradient  field  analysis  in  combination  with  the  gradient  field  feature  can  reduce  FPs  in  mass  detection 
on  FFDMs.  It  was  found  that  our  CAD  system  achieved  a  cased-based  sensitivity  of  70%,  80%,  and  88%  at  0.8,  1.3, 
and  1 .7  FP  marks/image,  respectively.  Further  study  is  underway  to  improve  the  CAD  system  using  a  larger  data  set. 
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Figure  1 :  The  block  diagram  of  CAD  algorithm  for  mass  detection  on  FFDMs. 


Figure  2:  The  block  diagram  for  preprocessing  of  raw  FFDM  images  by  multiscale  enhancement. 
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(a)  Original  image  (b)  Preprocessed  image  (c)  Prescreened  image 


(d)  Identified  suspicious  objects  (e)  Detection  result  (f)  Image  with  detected  objects 

Figure  3:  An  example  demonstrating  the  processing  steps  with  our  computer-aided  mass  detection  system. 
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Figure  4:  Image-based  FROC  curves.  DWCE:  prescreening  using  DWCE  filter.  GFA:  prescreening  using  gradient  field  analysis. 
GFA-Feature:  prescreening  using  gradient  field  analysis  and  the  addition  of  the  gradient  field  feature  for  FP  reduction. 
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Figure  5:  Case-based  FROC  curves.  DWCE:  prescreening  using  DWCE  filter.  GFA:  prescreening  using  gradient  field  analysis.  GFA- 
Feature:  prescreening  using  gradient  field  analysis  and  the  addition  of  the  gradient  field  feature  for  FP  reduction. 
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ABSTRACT 

We  have  developed  a  computer-aided  detection  (CAD)  system  for  breast  masses  on  mammograms.  In  this 
study,  our  purpose  was  to  improve  the  performance  of  our  mass  detection  system  by  using  a  new  dual  system  approach 
which  combines  a  CAD  system  optimized  with  ’’average”  masses  with  another  CAD  system  optimized  with  subtle 
masses.  The  latter  system  is  trained  to  provide  high  sensitivity  in  detecting  subtle  masses.  For  an  unknown 
mammogram,  the  two  systems  are  used  in  parallel  to  detect  suspicious  objects.  A  feed-forward  backpropagation 
neural  network  trained  to  merge  the  scores  of  the  two  linear  discriminant  analysis  (LDA)  classifiers  from  the  two 
systems  makes  the  final  decision  in  differentiation  of  true  masses  from  normal  tissue.  A  data  set  of  86  patients 
containing  172  mammograms  with  biopsy-proven  masses  was  partitioned  into  a  training  set  and  an  independent  test  set. 
This  data  set  is  referred  to  as  the  average  data  set.  A  second  data  set  of  214  prior  mammograms  was  used  for  training 
the  second  CAD  system  for  detection  of  subtle  masses.  When  the  single  CAD  system  trained  on  the  average  data  set 
was  applied  to  the  test  set,  the  Az  for  false  positive  (FP)  classification  was  0.81  and  the  FP  rates  were  2.1,  1.5  and  1.3 
FPs/image  at  the  case-based  sensitivities  of  95%,  90%  and  85%,  respectively.  With  the  dual  CAD  system,  the  Az  was 
0.85  and  the  FP  rates  were  improved  to  1.7,  1.2  and  0.8  FPs/image  at  the  same  case-based  sensitivities.  Our  results 
indicate  that  the  dual  CAD  system  can  improve  the  performance  of  mass  detection  on  mammograms. 

Keywords:  computer-aided  detection  (CAD),  mass  detection,  dual  CAD  system 


1.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  death  among  American  women  between  40  to  55  years  of  age1. 
It  has  been  reported  that  early  diagnosis  and  treatment  can  improve  significantly  the  chance  of  survival  for  patients  with 
breast  cancer'4.  Although  mammography  is  the  best  available  screening  tool  for  detection  of  breast  cancers,  studies 
indicate  that  a  substantial  fraction  of  breast  cancers  that  are  visible  upon  retrospective  analyses  of  the  images  are  not 
detected  initially5’7.  Computer-aided  detection  (CAD)  is  considered  to  be  one  of  the  promising  approaches  that  may 
improve  the  sensitivity  of  detecting  early  breast  cancer  in  screening  mammography.  It  has  been  shown  that  CAD  can 
increase  the  cancer  detection  rate  by  radiologists  both  in  the  laboratory  and  in  clinical  practice8'13. 


We  have  been  developing  CAD  systems  for  detection  and  characterization  of  mammographic  masses  and 
microcalcifications.  Detection  of  masses  on  mammograms  is  more  challenging  than  detection  of  microcalcifications 
because  the  normal  fibroglandular  tissue  in  the  breast  causes  false  positives  (FPs)  by  mimicking  masses  and  causes  false 
negatives  due  to  overlapping  with  the  lesions.  Therefore,  mass  detection  systems  generally  have  lower  sensitivity  and 
higher  FP  rate  than  microcalcification  detection  systems.  In  this  study,  we  are  investigating  the  effectiveness  of  a  dual 
system  approach  for  improving  the  performance  of  mass  detection  on  mammograms. 
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2.  MATERIALS  AND  METHODS 


2.1  Materials 

The  data  set  we  used  in  this  study  contained  86  cases.  Each  case  included  the  current  mammograms  that  were 
obtained  before  biopsy  and  the  prior  mammograms  obtained  from  previous  exams.  The  prior  mammograms  were  used 
for  training  the  second  system  because  masses  on  prior  mammograms  are  generally  more  subtle  than  those  on  current 
mammograms.  The  subtle  mass  set  does  not  have  to  be  obtained  from  the  same  cases  as  the  average  mass  set.  The 
current  set  contained  172  mammograms  and  the  prior  set  contained  214  mammograms.  All  data  were  collected  with 
Institutional  Review  Board  (IRB)  approval.  The  mammograms  in  this  data  set  were  digitized  by  a  Lumiscan  laser 
scanner  with  a  pixel  size  of  100  jUm  X 100 //m  and  12  bits  per  pixel.  All  of  the  current  cases  had  two 
mammographic  views:  the  craniocaudal  (CC)  view  and  the  mediolateral  oblique  (MLO)  view  or  the  lateral  view. 
There  were  86  biopsy-proven  masses  in  this  data  set.  The  true  locations  of  the  masses  were  identified  by  an 
experienced  MQSA  radiologist. 

2.2  Methods 

In  order  to  improve  the  performance  of  our  CAD  system  for  detection  of  subtle  masses,  we  developed  a  new 
dual  system  approach  which  combines  a  system  trained  with  ’’average”  masses  with  another  system  trained  with  subtle 
masses.  When  the  trained  dual  system  is  applied  to  an  unknown  mammogram,  the  two  CAD  systems  are  used  in  parallel 
to  detect  suspicious  objects  on  a  single  mammogram.  No  prior  mammogram  is  needed.  The  additional  FPs  from  the 
use  of  two  systems  are  reduced  by  feature  classification  in  an  information  fusion  stage.  Figure  1  shows  the  block 
diagram  for  the  dual  system. 


Figure  1.  The  block  diagram  of  the  dual  CAD  system  for  mass  detection  on  mammograms. 


Our  single  CAD  system  consists  of  five  processing  steps:  1)  digitization,  2)  pre-screening  of  mass  candidates, 
3)  identification  of  suspicious  objects,  4)  extraction  of  feature  parameters,  and  5)  classification  between  the  normal  and 
the  abnormal  regions  by  using  rule-based  and  FDA  classifiers.  The  block  diagram  for  the  single  CAD  system  is 
shown  in  Figure  2.  Figure  3  shows  an  example  demonstrating  the  processing  steps  with  our  computer-aided  mass 
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detection  system.  For  the  pre-screening  stage,  we  have  developed  a  two-stage  gradient  field  analysis  method  which 
uses  not  only  the  shape  information  of  masses  on  mammograms  but  also  incorporates  the  gray  level  information  of  the 
local  object  segmented  by  a  region  growing  technique  in  the  second  stage  to  refine  the  gradient  field  analysis14'15.  The 
gradient  field  analysis  was  used  to  determine  locations  of  high  convergence  of  radial  gradient  in  the  image.  A  region 
of  interest  (ROI)  of  256x256  pixels  is  then  identified  with  its  center  placed  at  each  location  of  high  gradient 
convergence.  The  object  in  each  ROI  is  segmented  by  a  region  growing  method16  in  which  the  location  of  high 
gradient  convergence  is  used  as  the  starting  point.  Figures  3(b)  and  3(c)  show  the  initial  detection  locations  and  the 
grown  objects,  respectively.  After  region  growing,  all  connected  pixels  constituting  the  object  are  labeled.  Finally, 
the  gradient  convergence  at  the  center  location  of  the  ROI  is  recalculated  within  the  segmented  object.  The  objects 
whose  new  gradient  convergence  is  lower  than  80%  of  the  original  value  are  rejected.  After  prescreening,  the 
suspicious  objects  are  identified  by  using  a  clustering-based  region  growing  method.  For  each  suspicious  object, 
eleven  morphological  features  are  extracted.  Rule-based  and  LDA  classifiers  are  trained  to  remove  the  detected 
normal  structures  that  are  substantially  different  from  breast  masses.  Global  and  local  multiresolution  texture 
analysis1718  are  performed  in  each  ROI  by  using  the  spatial  gray  level  dependence  matrices  at  different  pixel  spacings 
and  angular  directions.  In  order  to  obtain  the  best  feature  subset  and  reduce  the  dimensionality  of  the  feature  space  to 
design  a  robust  classifier,  feature  selection  with  stepwise  linear  discriminant  analysis  was  applied.  Finally,  LDA 
classification  is  used  to  identify  potential  breast  masses.  Figure  3(d)  shows  the  final  detected  objects,  and  Figure  3(e) 
shows  the  locations  of  these  objects  superimposed  on  the  mammogram. 


Figure  2.  The  block  diagram  of  a  single  CAD  system  for  mass  detection  on  mammograms. 


The  two  single  CAD  systems  were  independently  trained  with  the  “average”  mass  set  and  the  subtle  mass  set, 
respectively.  To  merge  the  information  from  the  two  CAD  systems,  the  two  LDA  discriminant  scores  from  the  two 
CAD  systems  were  used  to  define  a  new  feature  space.  A  feed-forward  backpropagation  neural  network  with  3  hidden 
nodes  was  then  trained  using  the  LDA  feature  scores  of  the  training  sets  as  input  to  differentiate  true  masses  from 
normal  tissue.  After  the  dual  CAD  system  was  trained,  its  performance  was  evaluated  on  the  independent  test  set  and 
compared  with  that  of  the  single  CAD  system. 
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(c)  Identified  suspicious  objects  fd)  Detection  result  (e)  Image  with  detected  objects 


Figure  3.  An  example  demonstrating  the  processing  steps  with  our  single  CAD  system  for  mass  detection. 


3.  RESULTS 

We  randomly  separated  the  cases  in  our  data  set  into  two  independent  equal  sized  data  sets,  each  with  43  cases. 
The  training  and  testing  were  performed  using  the  cross  validation  method.  The  detection  performance  of  the  CAD 
system  was  assessed  by  free  response  receiver  operating  characteristic  (FROC)  analysis.  FROC  curves  were  presented 
on  a  per-mammogram  and  a  per-case  basis.  For  mammogram-based  FROC  analysis,  the  mass  on  each  mammogram 
was  considered  an  independent  true  object;  the  sensitivity  was  thus  calculated  relative  to  86  masses.  For  case-based 
FROC  analysis,  the  same  mass  imaged  on  the  two-view  mammograms  was  considered  to  be  one  true  object  and  the 
detection  of  either  or  both  masses  on  the  two  views  was  considered  to  be  a  true-positive  (TP);  the  sensitivity  was  thus 
calculated  relative  to  43  masses.  The  average  test  FROC  curve  was  obtained  from  averaging  the  FP  rates  at  the  same 
sensitivity  along  the  two  corresponding  test  FROC  curves  from  the  2-fold  cross  validation.  When  the  single  CAD 
system  trained  on  the  average  data  set  was  applied  to  the  test  set,  the  Az  for  FP  classification  was  0.81  and  the 
FPs/image  were  2.1,  1.5  and  1.3  at  the  case-based  sensitivities  of  95%,  90%  and  85%,  respectively.  With  the  dual 
CAD  system,  the  Az  was  0.85  and  the  FP  rates  were  improved  to  1.7,  1.2  and  0.8  FPs/image  at  the  same  case-based 
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sensitivities.  Figure  4  and  5  shows  the  comparison  of  the  test  performance  of  the  single  and  dual  CAD  systems  by 
using  image-based  and  case-based  average  FROC  curves,  respectively. 


Number  of  False  Positives  per  Image 


Figure  4.  Image-based  average  FROC  curves  obtained  from  averaging  the  corresponding  FROC 
curves  of  the  two  test  subsets.  Single:  detection  by  the  single  CAD  system.  Dual: 
detection  by  the  dual  CAD  system. 


Number  of  False  Positives  per  Image 

Figure  5.  Case-based  average  FROC  curves  obtained  from  averaging  the  corresponding  FROC 
curves  of  the  two  test  subsets.  Single:  detection  by  the  single  CAD  system.  Dual: 
detection  by  the  dual  CAD  system. 
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4.  DISCUSSION  AND  CONCLUSIONS 


We  previously  developed  a  CAD  system  for  detection  of  masses  on  mammograms.  However,  we  found  that 
it  is  difficult  to  train  a  single  system  to  provide  optimal  detection  for  all  lesions  over  the  entire  spectrum  of  subtlety.  In 
this  study,  we  developed  a  dual  system  which  combines  a  system  trained  with  subtle  lesions  on  prior  mammograms  and 
a  system  trained  with  masses  detected  on  current  mammograms.  It  was  found  that  the  dual  CAD  system  could  achieve 
a  higher  accuracy  than  the  single  CAD  system.  Further  study  is  underway  to  optimize  the  fusion  scheme  in  our  dual 
system. 
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ABSTRACT 

We  are  developing  a  two-view  information  fusion  method  to  improve  the  performance  of  our  CAD  system  for 
mass  detection.  Mass  candidates  on  each  mammogram  were  first  detected  with  our  single-view  CAD  system. 
Potential  object  pairs  on  the  two-view  mammograms  were  then  identified  by  using  the  distance  between  the  object  and 
the  nipple.  Morphological  features,  Hessian  feature,  correlation  coefficients  between  the  two  paired  objects  and  texture 
features  were  used  as  input  to  train  a  similarity  classifier  that  estimated  a  similarity  scores  for  each  pair.  Finally,  a 
linear  discriminant  analysis  (LDA)  classifier  was  used  to  fuse  the  score  from  the  single-view  CAD  system  and  the 
similarity  score.  A  data  set  of  475  patients  containing  972  mammograms  with  475  biopsy-proven  masses  was  used  to 
train  and  test  the  CAD  system.  All  cases  contained  the  CC  view  and  the  MLO  or  LM  view.  We  randomly  divided  the 
data  set  into  two  independent  sets  of  243  cases  and  232  cases.  The  training  and  testing  were  performed  using  the  2-fold 
cross  validation  method.  The  detection  performance  of  the  CAD  system  was  assessed  by  free  response  receiver 
operating  characteristic  (FROC)  analysis.  The  average  test  FROC  curve  was  obtained  from  averaging  the  FP  rates  at 
the  same  sensitivity  along  the  two  corresponding  test  FROC  curves  from  the  2-fold  cross  validation.  At  the  case-based 
sensitivities  of  90%,  85%  and  80%  on  the  test  set,  the  single-view  CAD  system  achieved  an  FP  rate  of  2.0,  1.5,  and  1.2 
FPs/image,  respectively.  With  the  two-view  fusion  system,  the  FP  rates  were  reduced  to  1.7,  1.3,  and  1.0  FPs/image, 
respectively,  at  the  corresponding  sensitivities.  The  improvement  was  found  to  be  statistically  significant  (p< 0.05)  by 
the  AFROC  method.  Our  results  indicate  that  the  two-view  fusion  scheme  can  improve  the  performance  of  mass 
detection  on  mammograms. 

Keywords:  computer-aided  detection,  two-view  fusion,  mass  detection,  AFROC  analysis 


1.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  cancer  mortality  among  women1.  There  is  considerable  evidence 
that  early  diagnosis  and  treatment  significantly  improves  the  chance  of  survival  for  patients  with  breast  cancer  2"  . 
Although  mammography  has  a  high  sensitivity  for  detection  of  breast  cancers  when  compared  to  other  imaging 
modalities,  studies  indicate  that  radiologists  do  not  detect  all  carcinomas  that  are  visible  upon  retrospective  analyses  of 
the  images6'11.  It  has  been  shown  that  computer-aided  detection  (CAD)  can  improve  the  sensitivity  of  mammography 
in  prospective  clinical  trials1215.  CAD  is  thus  a  viable  cost-effective  alternative  to  double  reading  by  radiologists. 

The  mass  detection  systems  to-date  generally  employed  a  single-view  detection  approach  using  various 
techniques  for  prescreening  of  mass  candidates  and  classification  of  true  and  false  positives16'25.  We  have  been 
developing  CAD  systems  for  detection  of  mammographic  masses  on  full  field  digital  mammograms  (FFDMs)25  and 
screening  film  mammograms  (SFMs)22.  Our  previous  study23  showed  that  two-view  fusion  method  can  improve  the 
performance  of  a  CAD  system  for  mass  detection  on  mammograms.  In  this  study,  our  purpose  is  to  improve  the 
performance  of  the  two-view  information  fusion  method  and  to  test  our  method  in  a  relatively  larger  data  set. 


2.  MATERIALS  AND  METHODS 


2.1  Materials 


Medical  Imaging  2006:  Image  Processing,  edited  by  Joseph  M.  Reinhardt,  Josien  P.  W.  Pluim, 
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All  mammograms  in  this  study  were  collected  from  patient  files  in  the  Department  of  Radiology  at  the 
University  of  Michigan  with  Institutional  Review  Board  (IRB)  approval.  The  mammograms  were  digitized  with  a 
LUMISYS  85  laser  film  scanner  with  a  pixel  size  of  50/./  mx50//m  and  4096  gray  levels.  The  scanner  was  calibrated  to 
have  a  linear  relationship  between  gray  levels  and  optical  densities  (O.D.)  from  0.1  to  greater  than  3  O.D.  units.  The 
nominal  O.D.  range  of  the  scanner  is  0-4.  The  full  resolution  mammograms  were  first  smoothed  with  a  2x2  box  filter 
and  subsampled  by  a  factor  of  2,  resulting  in  images  with  a  pixel  size  of  1 00//  nix  1 00/./ m.  These  images  were  used  for 
the  input  of  our  CAD  system.  The  data  set  we  used  in  this  study  contained  475  cases,  of  which  464  cases  had  the  two- 
view  mammograms  (the  craniocaudal  (CC)  view  and  the  mediolateral  oblique  (MLO)  view  or  the  lateral  view)  and  1 1 
cases  had  four-view  mammograms,  resulting  in  a  total  of  972  mammograms.  All  mammograms  were  obtained  before 
biopsy.  There  were  475  biopsy-proven  masses  in  this  data  set. 


2.2  Methods 

2.2.1  Single-view  System  Overview 


Figure  1.  Block  diagram  of  a  single  CAD  system  for  mass  detection  on  mammograms. 

Our  single-view  CAD  system  consists  of  five  processing  steps:  1)  pre-screening  of  mass  candidates,  2) 
identification  of  suspicious  objects,  3)  extraction  of  morphological  and  texture  features,  and  4)  classification  between  the 
normal  and  the  abnormal  regions  by  using  rule-based  and  LDA  classifiers.  The  block  diagram  for  the  single-view  CAD 
system  is  shown  in  Figure  1 .  Figure  2  shows  an  example  demonstrating  the  processing  steps  with  our  computer-aided 
mass  detection  system.  For  the  pre-screening  stage,  we  have  developed  a  two-stage  gradient  field  analysis  method 
which  combines  the  shape  information  of  masses  on  mammograms  with  the  gray  level  information  of  the  local  object 
segmented  by  a  region  growing  technique  in  the  second  stage  to  refine  the  gradient  field  analysis.  The  gradient  field 
analysis  is  used  to  determine  locations  of  high  convergence  of  radial  gradient  in  the  image.  A  region  of  interest  (ROI) 
is  then  identified  with  its  center  placed  at  each  location  of  high  gradient  convergence.  The  object  in  each  ROI  is 
segmented  by  a  region  growing  method  in  which  the  location  of  high  gradient  convergence  is  used  as  the  starting  point. 
Figures  2(b)  and  2(c)  show  the  initial  detection  locations  and  the  grown  objects,  respectively.  After  region  growing,  all 
connected  pixels  constituting  the  object  are  labeled.  Finally,  the  gradient  convergence  at  the  center  location  of  the  ROI 
is  recalculated  within  the  segmented  object.  The  objects  whose  new  gradient  convergence  is  lower  than  80%  of  the 
original  value  are  rejected.  After  prescreening,  the  suspicious  objects  are  identified  by  using  a  clustering-based  region 
growing  method.  For  each  suspicious  object,  eleven  morphological  features  are  extracted.  Rule-based  and  LDA 
classifiers  are  trained  to  remove  the  detected  normal  structures  that  are  substantially  different  from  breast  masses. 
Global  and  local  multiresolution  texture  analyses  are  performed  in  each  ROI  by  using  the  spatial  gray  level  dependence 
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(SGLD)  matrices  at  different  pixel  spacings  and  angular  directions.  In  order  to  obtain  the  best  feature  subset  and 
reduce  the  dimensionality  of  the  feature  space  to  design  a  robust  classifier,  feature  selection  with  stepwise  linear 
discriminant  analysis  is  performed.  Finally,  LDA  classification  is  used  to  identify  potential  breast  masses.  Figure 
2(d)  shows  the  final  detected  objects,  and  Figure  2(e)  shows  the  locations  of  these  objects  superimposed  on  the 
mammogram. 


(a)  Original  image 


(b)  Prescreened  image 


(c)  Identified  suspicious  objects  (d)  Detection  result  (e)  Image  with  detected  objects 

Figure  2.  An  example  demonstrating  the  processing  steps  with  our  single-view  CAD  system  for  mass  detection. 


2.2.2  Two- View  Fusion 

In  order  to  improve  the  overall  performance  of  our  CAD  system  for  detection  of  masses,  we  developed  a  two- 
view  fusion  technique  which  combines  the  information  from  two  mammographic  views.  The  fusion  method  used  in 
this  study  is  based  on  the  assumption  that  the  corresponding  true  mass  on  two  different  mammographic  views  will 
exhibit  similarities  in  their  geometric,  morphological  and  textural  features  which  are  relatively  invariant  with  respect  to 
the  imaging  views.  On  the  other  hand,  FPs  detected  by  CAD  system  are  expected  to  exhibit  a  lesser  degree  of 
similarity  because  they  are  usually  objects  formed  by  different  normal  tissues. 

For  a  given  object  on  one  view,  geometric  pairing  is  first  performed  using  the  nipple-to-object  distance  as  the 
average  radius  of  an  annular  region  on  the  other  view  within  which  the  detected  objects  can  be  paired  with  the  given 
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object.  Manually  identified  nipple  locations  are  used  for  the  registration  in  this  study.  We  are  developing  an 
automated  nipple  detection  technique26  and  the  automated  method  will  be  used  when  it  reaches  high  accuracy. 
Similarity  measures  between  each  pair  of  objects  are  derived  from  the  pairs  of  individual  object  features.  The  similarity 
features  include  morphological  features,  Hessian  feature,  correlation  coefficients  between  the  two  paired  objects  and 
texture  features.  A  similarity  classifier  is  trained  to  distinguish  between  true  and  false  pairs  by  merging  the  similarity 
features  into  a  similarity  score  for  each  object.  The  similarity  score  and  the  single-view  object  score  of  the  object  are 
then  fused  to  form  a  final  score  for  the  object.  Our  two-view  system  is  summarized  in  Figure  3. 


Figure  3.  Block  diagram  of  the  two-view  CAD  system  for  mass  detection  on  mammograms. 


3.  Experimental  Results 

We  randomly  separated  the  cases  in  our  data  set  into  two  independent  equal  sized  data  sets:  243  cases  with  494 
images  and  232  cases  with  478  images.  The  training  and  testing  were  performed  using  the  2-fold  cross  validation 
method.  The  detection  performance  of  the  CAD  system  was  assessed  by  free  response  receiver  operating  characteristic 
(FROC)  analysis.  FROC  curves  were  presented  on  a  per-mammogram  and  a  per-case  basis.  For  mammogram-based 
FROC  analysis,  the  mass  on  each  mammogram  was  considered  an  independent  true  object.  For  case-based  FROC 
analysis,  the  same  mass  imaged  on  the  two-view  mammograms  was  considered  to  be  one  true  object  and  the  detection  of 
either  or  both  masses  on  the  two  views  was  considered  to  be  a  true-positive  (TP).  To  evaluate  the  overall  test 
performance,  an  average  test  FROC  curve  was  obtained  from  averaging  the  FP  rates  at  the  same  sensitivity  along  the  two 
corresponding  test  FROC  curves  from  the  2-fold  cross  validation.  When  the  single-view  CAD  system  was  applied  to 
the  test  set,  the  FPs/image  were  2.0,  1.5,  and  1.2  at  the  case-based  sensitivities  of  90%,  85%  and  80%,  respectively. 
With  the  two-view  CAD  system,  the  FP  rates  were  improved  to  1.7,  1.3,  and  1.0  FPs/image  at  the  same  case-based 
sensitivities.  Figure  4  and  5  shows  the  comparison  of  the  test  performance  of  the  single-view  CAD  system  and  the  two- 
view  CAD  systems  by  using  image-based  and  case-based  average  FROC  curves,  respectively.  To  analyze  the 
improvement  in  the  FROC  curves  statistically,  an  alternative  free-response  ROC  (AFROC)  method"7  was  employed.  In 
the  AFROC  method,  false-positive  images  (FPI)  instead  of  FPs  per  image  are  counted.  The  confidence  rating  of  an  FPI 
is  determined  by  the  highest  confidence  FP  decision  on  the  image  regardless  of  how  many  lower  confidence  FP  decisions 


Proc.  of  SPIE  Vol.  6144  614424-4 


are  made  on  the  same  image.  The  ROCKIT  software  developed  by  Metz  et  al28  is  used  to  analyze  the  AFROC  data. 
The  comparison  of  the  A]  and  the  p  values  is  summarized  in  Table  1. 


Number  of  False  Positives  per  Image 


Number  of  False  Positives  per  Image 


Figure  4.  Image-based  average  FROC  curves  obtained 
from  averaging  the  corresponding  FROC  curves  of  the 
two  test  subsets.  Single-view:  detection  by  the  single¬ 
view  CAD  system.  Two-view:  detection  by  the  two- 
view  CAD  system. 


Figure  5.  Case-based  average  FROC  curves  obtained 
from  averaging  the  corresponding  FROC  curves  of 
the  two  test  subsets.  Single-view:  detection  by  the 
single-view  CAD  system.  Two-view:  detection  by 
the  two-view  CAD  system. 


Table  1.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC 
performances  of  the  single-view  CAD  system  and  the  two-view  CAD  system. 


A,  (AFROC) 

Test  Set  1 

Test  Set  2 

One-view  CAD 

0.52 

0.51 

Two-view  CAD 

0.55 

0.54 

P  V  alue 

<0.0001 

<0.0001 

4.  DISCUSSION  AND  CONCLUSIONS 

In  this  study,  we  developed  a  two-view  CAD  system  to  improve  the  computerized  detection  of  masses  on 
mammograms.  The  two-view  CAD  system  is  different  from  case-based  scoring,  in  which  detection  of  the  same  mass  in 
either  the  CC  view  or  the  MLO  view  will  be  counted  as  a  true  positive,  in  that  the  detected  objects  in  the  two  views  are 
correlated  and  analyzed  for  similarity  and  the  likelihood  score  of  a  mass  detected  in  both  views  may  be  enhanced 
compared  with  FPs.  Our  results  indicate  that  two-view  fusion  can  significantly  improve  the  overall  performance  of  the 
single-view  CAD  system.  Future  work  will  include  automated  identification  of  nipple  locations  and  optimization  of  the 
fusion  scheme  in  our  system. 
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ABSTRACT 

In  computer-aided  detection  (CAD)  applications,  an  important  step  is  to  design  a  classifier  for  the 
differentiation  of  the  abnormal  from  the  normal  structures.  We  have  previously  developed  a  stepwise  linear 
discriminant  analysis  (LDA)  method  with  simplex  optimization  for  this  purpose.  In  this  study,  our  goal  was  to 
investigate  the  performance  of  a  regularized  discriminant  analysis  (RDA)  classifier  in  combination  with  a  feature 
selection  method  for  classification  of  the  masses  and  normal  tissues  detected  on  full  field  digital  mammograms  (FFDM). 
The  feature  selection  scheme  combined  a  forward  stepwise  feature  selection  process  and  a  backward  stepwise  feature 
elimination  process  to  obtain  the  best  feature  subset.  An  RDA  classifier  and  an  LDA  classifier  in  combination  with  this 
new  feature  selection  method  were  compared  to  an  LDA  classifier  with  stepwise  feature  selection.  A  data  set  of  130 
patients  containing  260  mammograms  with  130  biopsy-proven  masses  was  used.  All  cases  had  two  mammographic 
views.  The  true  locations  of  the  masses  were  identified  by  experienced  radiologists.  To  evaluate  the  performance  of 
the  classifiers,  we  randomly  divided  the  data  set  into  two  independent  sets  of  approximately  equal  size  for  training  and 
testing.  The  training  and  testing  were  performed  using  the  2-fold  cross  validation  method.  The  detection  performance 
of  the  CAD  system  was  assessed  by  free  response  receiver  operating  characteristic  (FROC)  analysis.  The  average  test 
FROC  curve  was  obtained  by  averaging  the  FP  rates  at  the  same  sensitivity  along  the  two  corresponding  test  FROC 
curves  from  the  2-fold  cross  validation.  At  the  case-based  sensitivities  of  90%,  80%  and  70%  on  the  test  set,  our  RDA 
classifier  with  the  new  feature  selection  scheme  achieved  an  FP  rate  of  1.8,  1.1,  and  0.6  FPs/image,  respectively, 
compared  to  2.1,  1.4,  and  0.8  FPs/image  with  stepwise  LDA  with  simplex  optimization.  Our  results  indicate  that  RDA 
in  combination  with  the  sequential  forward  inclusion-backward  elimination  feature  selection  method  can  improve  the 
performance  of  mass  detection  on  mammograms.  Further  work  is  underway  to  optimize  the  feature  selection  and 
classification  scheme  and  to  evaluate  if  this  approach  can  be  generalized  to  other  CAD  classification  tasks. 

Keywords:  computer-aided  detection,  full  field  digital  mammogram,  mass  detection,  regularized  discriminant  analysis, 
feature  selection 


1.  INTRODUCTION 

Breast  cancer  is  the  most  common  cancer  among  American  women1.  Early  detection  and  diagnosis  can 
significantly  increase  the  survival  rate2  4.  Recent  clinical  studies  have  shown  that  computer-aided  detection  (CAD) 
systems  are  helpful  for  increasing  radiologists’  accuracy  in  detecting  breast  cancers5's. 

We  have  been  developing  CAD  systems  for  detection  and  characterization  of  mammographic  masses  and 
microcalcifications.  Detection  of  masses  on  mammograms  is  more  challenging  than  detection  of  microcalcifications 
because  the  normal  fibroglandular  tissue  in  the  breast  causes  false  positives  (FPs)  by  mimicking  masses  and  causes  false 
negatives  due  to  overlapping  with  the  lesions.  Therefore,  mass  detection  systems  generally  have  lower  sensitivity  and 
higher  FP  rate  than  microcalcification  detection  systems.  We  are  investigating  methods  to  improve  the  overall 
performance  of  our  CAD  systems. 

False  positive  (FP)  classification  is  an  important  step  in  a  CAD  system.  The  basic  approach  in  two-class 
classification  is  to  assign  an  unknown  sample  to  one  of  the  two  classes  on  the  basis  of  a  multidimensional  feature  space. 
A  number  of  methods  have  been  proposed  in  previous  studies9"11.  Most  of  the  methods  are  based  on  linear  discriminant 
analysis  (LDA),  artificial  neural  networks,  and  rule -based  classifiers1".  Recently,  support  vector  machines  were  used  to 
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classify  the  malignant  and  benign  clustered  microcalcifications  on  mammograms13.  In  medical  imaging  application,  a 
main  problem  during  the  classifier  design  is  the  finite  sample  size  available  which  biases  the  performance  of  the  trained 
classifier  for  unknown  cases.  In  this  study,  we  are  investigating  the  performance  of  a  regularized  discriminant  analysis 
(RDA)  classifier  in  combination  with  a  feature  selection  method  for  classification  of  the  masses  and  normal  tissues 
detected  on  full  field  digital  mammograms  (FFDMs). 


2.  MATERIALS  AND  METHODS 


2.1  Materials 

IRB  approval  was  obtained  prior  to  the  commencement  of  this  investigation.  The  images  used  in  this  study  were 
acquired  at  the  University  of  Michigan  with  a  GE  Senographe  2000D  FFDM  system  before  biopsy.  The  GE  system  has  a 
Csl  phosphor/a:Si  active  matrix  flat  panel  digital  detector  with  a  pixel  size  of  1 00//mx  1  00  pm  and  14  bits  per  pixel.  A 
data  set  of  130  cases  was  used.  All  cases  had  two  mammographic  views,  the  craniocaudal  (CC)  view  and  the 
mediolateral  oblique  (MLO)  view  or  the  lateral  (LM  or  ML)  view.  The  data  set  contained  130  biopsy-proven  masses. 
The  true  locations  of  the  masses  were  identified  by  a  Mammography  Quality  Standards  Act  radiologist. 

2.2  Methods 

2.2.1  Discriminant  Analysis 

Assume  that  the  class  distributions  are  multivariate  normal  in  a  two-class  classification  problem.  Under  this 
condition,  discriminant  analysis  models  differ  essentially  by  the  specific  assumptions  on  the  mean  vectors  and 
covariance  matrices  of  the  group  conditional  densities.  The  most  commonly  used  model  is  linear  discriminant  analysis 
(LDA)  which  assumes  that  the  group  conditional  distributions  are  multivariate  normal  distributions  with  mean  vectors 
JUk  ,  where  k  =  1,  2  is  the  class  index,  and  equal  covariance  matrix  X  .  The  definition  of  LDA  is  given  in  Eq.  (1). 

F  =  (//1-//2)7’X-1X  (1) 

where  X‘=(xi,  x„)  is  the  feature  vector  of  a  sample  and  n  is  the  dimensionality  of  the  feature  space.  If  the 

covariance  matrices  are  not  equal,  one  can  use  quadratic  discriminant  analysis  (QDA),  which  has  a  quadratic  term  for  the 
feature  vector  in  its  model.  The  definition  of  QDA  is  described  in  Eq.  (2). 

Y  =  ^xT (x;1  -  x^1 )  a  - (Jil x;1  -  x^1 )  a  (2) 

The  parameters  in  LDA  and  QDA  are  usually  unknown  and  have  to  be  estimated  from  training  samples.  In  medical 
imaging  applications,  the  sample  size  may  be  very  small  in  comparison  with  the  dimensionality  of  the  feature  space. 
A  regularization  technique  for  discriminant  analysis,  referred  to  as  regularized  discriminant  analysis  (RDA)14,  makes  use 
of  a  complexity  parameter  and  a  shrinkage  parameter  to  design  an  intermediate  classification  model  between  LDA  and 
QDA.  The  covariance  matrices  can  thus  be  written  as: 


±k  =  (1  -  y)X*  +  —tr[Zk  ]/  ,  k=  1,  2  (3) 

P 

where  I  is  the  identity  matrix,  y  and  p  are  the  complexity  parameter  and  the  shrinkage  parameter,  respectively. 
In  this  work,  we  investigated  the  use  of  the  RDA  classifier  for  FP  reduction  in  a  mass  CAD  system. 


Proc.  of  SPIE  Vol.  6144  61445P-2 


2.2.2  CAD  System  Overview 


Figure  1.  Block  diagram  of  CAD  system  for  mass  detection  on  FFDMs. 

Our  CAD  system  consists  of  five  processing  steps:  (1)  preprocessing  by  using  multi-scale  enhancement,  (2)  pre¬ 
screening  of  mass  candidates,  (3)  identification  of  suspicious  objects,  (4)  feature  extraction  and  analysis,  and  (5)  FP 
reduction  by  classification  of  normal  tissue  structures  and  masses.  The  block  diagram  for  the  scheme  is  shown  in 
Figure  1.  FFDMs  generally  are  pre-processed  with  proprietary  methods  before  being  displayed  to  readers.  In  an  effort 
to  develop  a  CAD  system  that  is  less  dependent  on  specific  FFDM  systems,  the  raw  digital  images  are  used  as  input  to 
our  system.  A  preprocessing  scheme  based  on  a  multi -resolution  method15  has  been  developed  for  image  enhancement. 
This  scheme  consists  of  three  steps.  First,  the  boundary  of  the  breast  is  detected  automatically  by  using  Otsu’s 
method16.  Second,  the  Laplacian  pyramid  is  used  to  decompose  the  image  into  multi-scales.  A  nonlinear  weight 
function  is  designed  to  enhance  each  high-pass  component.  Finally,  the  Gaussian  pyramid  is  used  to  reconstruct  the 
multi-scales.  An  example  of  an  original  mammogram  and  the  enhanced  mammogram  are  shown  in  Figs.  2(a)  and  2(b), 
respectively.  After  preprocessing,  gradient  field  analysis  was  used  to  detect  the  mass  candidates  from  the  preprocessed 
FFDMs.  The  suspicious  objects  are  then  identified  by  using  a  clustering  based  region  growing  method.  Figures  2(c) 
and  2(d)  show  the  initial  detection  locations  and  the  grown  objects,  respectively.  For  each  suspicious  object,  eleven 
morphologic  features  are  extracted  and  rule-based  and  discriminant  classifiers  are  trained  to  remove  the  detected  normal 
structures  that  are  substantially  different  from  breast  masses.  Global  and  local  multiresolution  texture  analysis17' 18  are 
performed  in  each  region  of  interest  by  using  the  spatial  gray  level  dependence  matrix.  Finally,  discriminant 
classification  is  used  to  identify  potential  breast  masses.  Further  details  of  this  algorithm  can  be  found  in  the  literature19. 

In  order  to  obtain  the  best  texture  feature  subset  and  reduce  the  dimensionality  of  the  feature  space  to  design 
an  effective  classifier,  feature  selection  was  applied  to  the  training  set.  Stepwise  LDA  feature  selection  with  Wilks’ 
lambda  as  the  selection  criterion  was  employed  in  our  previous  study.  Simplex  optimization  procedure  was  used  to 
choose  the  best  set  of  feature  selection  parameters  which  includes  a  threshold  Fm  for  feature  entry,  a  threshold  for 
feature  removal,  and  a  tolerance  threshold  T  for  excluding  features  that  have  high  correlation  with  the  features  already  in 
the  selected  pool.  In  this  study,  we  compared  a  new  stepwise  feature  selection  procedure  with  the  current  method.  In 
the  proposed  method,  a  feature  selection  scheme  which  combines  forward  stepwise  feature  selection  and  backward 
stepwise  feature  elimination  is  used  to  obtain  the  best  feature  subset,  using  the  area  under  the  receiver  operating 
characteristic  (ROC)  curve,  Az,  as  the  selection  criterion  instead  of  Wilks’  lambda.  We  evaluated  the  classifier 
performance  using  a  leave-one-case-out  resampling  scheme  within  the  training  set,  the  test  discriminant  scores  from  the 
left-out  cases  were  analyzed  using  ROC  methodology.  The  discriminant  scores  were  input  as  the  decision  variable  in 
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the  LAB  ROC  program,  which  fits  a  binormal  ROC  curve  based  on  maximum  likelihood  estimation.  The  performances 
of  the  RDA  classifier  and  the  LDA  classifier,  both  with  the  new  feature  selection  method,  were  compared  to  that  of  the 
LDA  classifier  using  the  Wilks'  lambda  as  the  stepwise  feature  selection  criterion  in  terms  of  their  Az  for  the 
classification  of  masses  and  normal  tissue. 


(f)  Image  with  detected  objects 


Figure  2:  An  example  demonstrating  the  processing  steps  with  our  computer-aided  mass  detection  system. 


3.  RESULTS 

We  randomly  separated  the  cases  in  our  data  set  into  two  independent  data  subsets:  66  and  64  cases.  The 
training  and  testing  were  performed  using  the  cross  validation  method.  The  detection  performance  of  the  CAD  system 
was  assessed  by  free  response  receiver  operating  characteristic  (FROC)  analysis.  FROC  curves  were  presented  on  a 
per-mammogram  and  a  per-case  basis.  For  mammogram-based  FROC  analysis,  the  mass  on  each  mammogram  was 
considered  as  an  independent  true  object.  For  case-based  FROC  analysis,  the  same  mass  imaged  on  the  two-view 
mammograms  was  considered  to  be  one  true  object  and  the  detection  of  either  or  both  masses  on  the  two  views  was 
considered  to  be  a  true-positive  (TP).  The  average  test  FROC  curve  was  obtained  by  averaging  the  FP  rates  at  the  same 
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sensitivity  along  the  two  corresponding  test  FROC  curves  from  the  2-fold  cross  validation.  The  CAD  system  using 
RDA  with  the  new  feature  selection  method  achieved  an  image-based  sensitivity  of  60%,  65%,  and  70%  at  1.1,  1.4,  and 
1.6  FPs/image,  respectively,  compared  with  1.4,  1.7,  and  2.1  FPs/image  for  the  CAD  system  using  LDA  with  the  new 
feature  selection  method.  The  CAD  system  with  stepwise  LDA  and  simplex  optimization  achieved  FP  rates  of  1.6,  1.9, 
and  2.2  FPs/image,  respectively,  at  the  same  sensitivities,  which  were  comparable  to  the  FP  rates  of  the  CAD  system 
using  LDA  with  the  new  feature  selection  method.  For  case-based  FROC  analysis,  the  results  are  summarized  in  Table 
1 .  Figures  3  and  4  show  the  comparison  of  the  image-based  and  case-based  average  FROC  curves  of  the  CAD  systems 
using  the  three  different  classification  methods,  respectively. 

Table  1.  Comparison  of  case-based  performance  of  three  methods.  OFS:  stepwise  feature  selection  with  simplex 
optimization.  NFS:  feature  selection  combining  forward  feature  selection  and  backward  feature 
elimination. 


TP 

FPs/image 

LDA-OFS 

LDA-NFS 

RDA-NFS 

70% 

0.8 

0.7 

0.6 

80% 

1.4 

1.3 

1.1 

90% 

2.1 

2.2 

1.8 

Number  of  False  Positives  per  Image 

Figure  3.  Comparison  of  image-based  FROC  curves. 

OFS:  stepwise  feature  selection  with  simplex 
optimization.  NFS:  feature  selection 
combining  forward  feature  selection  and 
backward  feature  elimination. 


Number  of  False  Positives  per  image 


Figure  4.  Comparison  of  case-based  FROC  curves. 

OFS:  stepwise  feature  selection  with  simplex 
optimization.  NFS:  feature  selection 
combining  forward  feature  selection  and 
backward  feature  elimination. 


4.  DISCUSSION  AND  CONCLUSIONS 


We  previously  developed  a  CAD  system  for  detection  of  masses  on  FFDMs.  In  this  study,  we  investigated  the 
use  of  an  RDA  classifier  with  a  new  feature  selection  method.  Our  results  indicated  that  the  new  FP  classifier  can 
improve  the  overall  performance  of  our  CAD  system.  Further  work  is  underway  to  optimize  the  feature  selection  and 
classification  scheme  and  to  evaluate  if  this  approach  can  be  generalized  to  other  CAD  classification  tasks. 

ACKNOWLEGMENTS 

This  work  is  supported  by  U.  S.  Army  Medical  Research  and  Materiel  Command  grants  DAMD  17-02-1-0214 
and  W81XWH-04- 1-0475  and  USPHS  grant  CA95153.  The  content  of  this  paper  does  not  necessarily  reflect  the 


Proc.  of  SPIE  Vol.  6144  61445P-5 


position  of  the  government  and  no  official  endorsement  of  any  equipment  and  product  of  any  companies  mentioned 
should  be  inferred.  The  authors  are  grateful  to  Charles  E.  Metz,  Ph.D.,  for  the  LABROC  program. 


REFERENCES 

1.  “American  cancer  society,  www.Cancer.Qrg  2004,  "Statistics  for  2004",”  American  Cancer  Society  (2004). 

2.  C.  R.  Smart,  R.  E.  Hendrick,  J.  H.  Rutledge  and  R.  A.  Smith,  "Benefit  of  mammography  screening  in  women  ages 
40  to  49  years:  Current  evidence  from  randomized  controlled  trials,"  Cancer  75,  1619-1626,  1995. 

3.  S.  A.  Feig,  C.  J.  D'orsi,  R.  E.  Hendrick,  V.  P.  Jackson,  D.  B.  Kopans,  B.  Monsees,  E.  A.  Sickles,  C.  B.  Stelling,  M. 
Zinninger  and  P.  Wilcox-Buchalla,  "American  college  of  radiology  guidelines  for  breast  cancer  screening,"  AJR  Am 
J  Roentgenol.  171,29-33,  1998. 

4.  B.  Cady  and  J.  S.  Michaelson,  "The  life-sparing  potential  of  mammographic  screening,"  CANCER  91,  1699-1703, 
2001. 

5.  T.  W.  Freer  and  M.  J.  Ulissey,  "Screening  mammography  with  computer-aided  detection:  Prospective  study  of 
12,860  patients  in  a  community  breast  center,"  Radiology  220,  781-786,  2001. 

6.  S.  V.  Destounis,  P.  Dinitto,  W.  Logan-Young,  E.  Bonaccio,  M.  L.  Zuley  and  K.  M.  Willison,  "Can  computer-aided 
detection  with  double  reading  of  screening  mammograms  help  decrease  the  false-negative  rate?  Initial  experience," 
Radiology  232,  578-584,  2004. 

7.  M.  A.  Helvie,  L.  M.  Hadjiiski,  E.  Makariou,  H.  P.  Chan,  N.  Petrick,  B.  Sahiner,  S.  C.  B.  Lo,  M.  Freedman,  D.  Adler, 
J.  Bailey,  C.  Blane,  D.  Hoff,  K.  Hunt,  L.  Joynt,  K.  Klein,  C.  Paramagul,  S.  Patterson  and  M.  A.  Roubidoux, 
"Sensitivity  of  noncommercial  computer-aided  detection  system  for  mammographic  breast  cancer  detection  -  a  pilot 
clinical  trial,"  Radiology  231,  208-214,  2004. 

8.  R.  L.  Birdwell,  P.  Bandodkar  and  D.  M.  Ikeda,  "Computer-aided  detection  with  screening  mammography  in  a 
university  hospital  setting,"  Radiology  236,  451-457,  2005. 

9.  H.  P.  Chan,  S.  C.  B.  Lo,  B.  Sahiner,  K.  L.  Lam  and  M.  A.  Helvie,  "Computer-aided  detection  of  mammographic 
microcalcifications:  Pattern  recognition  with  an  artificial  neural  network,"  Medical  Physics  22,  1555-1567,  1995. 

10.  H.  P.  Chan,  B.  Sahiner,  R.  F.  Wagner  and  N.  Petrick,  "Classifier  design  for  computer-aided  diagnosis:  Effects  of 
finite  sample  size  on  the  mean  performance  of  classical  and  neural  network  classifiers,"  Medical  Physics  26,  2654- 
2668,  1999. 

11.  B.  Sahiner,  H.  P.  Chan,  N.  Petrick,  D.  Wei,  M.  A.  Helvie,  D.  D.  Adler  and  M.  M.  Goodsitt,  "Classification  of  mass 
and  normal  breast  tissue:  A  convolution  neural  network  classifier  with  spatial  domain  and  texture  images,"  IEEE 
Transactions  on  Medical  Imaging  15,  598-610,  1996. 

12.  Q.  Li  and  K.  Doi,  "Analysis  and  minimization  of  overtraining  effect  in  rule-based  classifiers  for  computer-aided 
diagnosis,"  Med.  Phys  33,  320-328,  2006. 

13.  J.  Wei,  B.  Sahiner,  L.  M.  Hadjiiski,  H.-P.  Chan,  N.  Petrick,  M.  A.  Helvie,  M.  A.  Roubidoux,  J.  Ge  and  C.  Zhou, 
"Computer  aided  detection  of  breast  masses  on  full  field  digital  mammograms,"  Med. Phys  2005  (Submitted). 

14.  J.  Friedman,  "Regularized  discriminant  analysis,"  Journal  of  the  American  Statistical  Association  84,  165-175, 
1989. 

15.  P.  J.  Burt  and  E.  H.  Adelson,  "The  laplacian  pyramid  as  a  compact  image  code,"  IEEE  Transactions  on 
Communications  COM-31,  337-345,  1983. 

16.  N.  Otsu,  "A  threshold  selection  method  from  gray-level  histograms,"  IEEE  Trans.  System,  Man,  Cybernetics  9,  62- 
66,  1979. 

17.  D.  Wei,  H.  P.  Chan,  M.  A.  Helvie,  B.  Sahiner,  N.  Petrick,  D.  D.  Adler  and  M.  M.  Goodsitt,  "Classification  of  mass 
and  normal  breast  tissue  on  digital  mammograpms:  Multiresolution  texture  analysis,"  Medical  Physics  22,  1501- 
1513,  1995. 

18.  D.  Wei,  H.  P.  Chan,  N.  Petrick,  B.  Sahiner,  M.  A.  Helvie,  D.  D.  Adler  and  M.  M.  Goodsitt,  "False-positive  reduction 
technique  for  detection  of  masses  on  digital  mammograms:  Global  and  local  multiresolution  texture  analysis," 
Medical  Physics  24,  903-914,  1997. 

19.  J.  Wei,  B.  Sahiner,  L.  M.  Hadjiiski,  H.  P.  Chan,  N.  Petrick,  M.  A.  Helvie,  M.  A.  Roubidoux,  J.  Ge  and  C.  Zhou, 
"Computer  aided  detection  of  breast  masses  on  full  field  digital  mammograms,"  Medical  Physics  32,  2827-2838, 
2005. 


Proc.  of  SPIE  Vol.  6144  61445P-6 


Computer  aided  detection  of  breast  masses  on  prior  mammograms 


Jun  Wei  ,  Berkman  Sahiner,  Heang-Ping  Chan,  Lubomir  M.  Hadjiiski,  Marilyn  A.  Roubidoux, 

Mark  A.  Helvie,  Jun  Ge,  Chuan  Zhou,  Yi-Ta  Wu 
Department  of  Radiology,  The  University  of  Michigan,  Ann  Arbor,  MI  48109 


ABSTRACT 

An  important  purpose  of  a  CAD  system  is  that  it  can  serve  as  a  second  reader  to  alert  radiologists  to  subtle  cancers  that 
may  be  overlooked.  In  this  study,  we  are  developing  new  computer  vision  techniques  to  improve  the  detection 
performance  for  subtle  masses  on  prior  mammograms.  A  data  set  of  159  patients  containing  318  current  mammograms 
and  402  prior  mammograms  was  collected.  A  new  technique  combining  gradient  field  analysis  with  Hessian  analysis 
was  developed  to  prescreen  for  mass  candidates.  A  suspicious  structure  in  each  identified  location  was  initially 
segmented  by  seed-based  region  growing  and  then  refined  by  using  an  active  contour  method.  Morphological,  gray 
level  histogram  and  run-length  statistics  features  were  extracted.  Rule-based  and  LDA  classifiers  were  trained  to 
differentiate  masses  from  normal  tissues.  We  randomly  divided  the  data  set  into  two  independent  sets;  one  set  of  78 
cases  for  training  and  the  other  set  of  81  cases  for  testing.  With  our  previous  CAD  system,  the  case-based  sensitivities 
on  prior  mammograms  were  63%,  48%  and  32%  at  2,  1  and  0.5  FPs/image,  respectively.  With  the  new  CAD  system, 
the  case-based  sensitivities  were  improved  to  74%,  56%  and  35%,  respectively,  at  the  same  FP  rates.  The  difference  in 
the  FROC  curves  was  statistically  significant  (p<0.05  by  AFROC  analysis).  The  performances  of  the  two  systems  for 
detection  of  masses  on  current  mammograms  were  comparable.  The  results  indicated  that  the  new  CAD  system  can 
improve  the  detection  performance  for  subtle  masses  without  a  trade-off  in  detection  of  average  masses. 

Keywords:  computer-aided  detection,  prior  mammogram,  mass  detection,  AFROC  analysis 

1.  INTRODUCTION 

Breast  cancer  is  one  of  the  leading  causes  of  cancer  mortality  among  women1.  Studies  indicate  that  radiologists  do  not 
detect  all  carcinomas  that  are  visible  upon  retrospective  analyses  of  the  images2"8.  Computer-aided  diagnosis  (CAD)  is 
considered  to  be  one  of  the  promising  approaches  that  may  improve  the  sensitivity  of  mammography9, 10. 


An  important  application  of  a  CAD  system  is  to  serve  as  a  second  reader  to  alert  radiologists  to  subtle  cancers  that  may 
be  overlooked.  Masses  retrospectively  seen  on  prior  mammograms  represent  the  difficult  cases  that  are  more  likely  to 
be  missed  by  radiologists.  To  study  the  ability  of  a  CAD  system  in  detecting  subtle  cancers,  one  way  is  to  evaluate  its 
accuracy  in  detecting  missed  cancers  on  prior  mammograms.  Our  previous  experiences  indicate  that  CAD  schemes 
trained  with  cancers  on  current  images  do  not  perform  well  in  detecting  masses  seen  retrospectively  on  prior  images11. 
In  this  study,  we  designed  new  techniques  to  improve  the  detection  performance  for  subtle  masses  on  prior 
mammograms  and  also  evaluated  the  new  CAD  system  on  both  prior  and  current  mammograms  by  comparing  with  our 
previously  developed  CAD  system12. 


2.  MATERIALS  AND  METHODS 


2.1  Materials 

All  mammograms  in  this  study  were  collected  from  patient  files  in  the  Department  of  Radiology  at  the  University  of 
Michigan  with  Institutional  Review  Board  (IRB)  approval.  The  mammograms  were  digitized  with  a  LUMISYS  85 
laser  film  scanner  with  a  pixel  size  of  SO/rm^SO/rm  and  4096  gray  levels.  The  scanner  was  calibrated  to  have  a  linear 
relationship  between  gray  levels  and  optical  densities  (O.D.)  from  0.1  to  greater  than  3  O.D.  units.  The  nominal  O.D. 
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range  of  the  scanner  is  0-4.  The  full  resolution  mammograms  were  first  smoothed  with  a  2x2  box  filter  and 
subsampled  by  a  factor  of  2,  resulting  in  images  with  a  pixel  size  of  lOO/rmxlOO/rm.  These  images  were  used  for  the 
input  of  our  CAD  system.  The  data  set  we  used  in  this  study  contained  159  patients.  Each  exam  had  two 
mammographic  views,  resulting  in  a  total  of  318  current  mammograms  and  402  prior  mammograms.  Forty-two 
patients  had  two  years  of  prior  examinations.  All  mammograms  were  obtained  before  biopsy.  There  were  159  biopsy- 
proven  masses  in  this  data  set.  Figures  1  and  2  showed  the  histograms  of  mass  sizes  and  visibility,  respectively,  for  the 
comparison  of  current  and  prior  masses.  The  size  of  a  mass  was  estimated  as  its  longest  diameter  seen  on  the 
mammograms.  The  visibility  of  the  masses  was  rated  by  an  experienced  radiologist  on  a  10-point  scale  with  1 
representing  the  most  visible  masses  and  10  the  most  difficult  case  relative  to  the  cases  seen  in  clinical  practice.  The 
mass  size  ranged  from  3  to  42  mm  (mean  size:  14.3±8.6  mm  on  current  mammograms  and  1 0.9±6.6  mm  on  prior 
mammograms)  and  the  visibility  ratings  extended  over  the  entire  range.  For  the  current  mammograms,  140  of  the 
masses  were  visible  on  both  views  and  19  visible  on  only  one  view.  For  the  prior  mammograms,  100  masses  were 
visible  on  both  views  and  101  visible  only  on  one  view.  Therefore,  there  were  299  visible  and  19  invisible  masses  on 
current  mammograms  and  301  visible  and  101  invisible  masses  on  prior  mammograms  if  the  masses  were  counted 
independently  by  mammographic  view. 
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Figure  1.  Histogram  of  the  sizes  for  299  masses  on  current  mammograms  and  301  masses  on  prior  in  our  data  set. 
Mass  sizes  are  measured  as  the  longest  dimension  of  the  mass  by  an  experienced  MQSA  radiologist.  The  size 
of  the  masses  in  this  data  set  ranged  from  3  to  42  mm. 
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Figure  2.  Histogram  of  the  visibility  of  the  masses  in  our  data  set.  The  visibility  is  evaluated  on  a  10-point  rating 
scale  with  1  representing  the  most  visible  masses  and  10  the  most  difficult  case  relative  to  the  cases  seen  in  their 
clinical  practice.  The  masses  that  were  not  visible  were  plotted  in  the  column  labeled  as  “INV”. 
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2.2  Methods 


2.2.1  CAD  System  Overview 


Figure  3.  Block  diagram  of  a  single  CAD  system  for  mass  detection  on  mammograms. 


Our  CAD  system  consists  of  five  processing  steps:  1)  pre-screening  of  mass  candidates,  2)  identification  of  suspicious 
objects,  3)  extraction  of  morphological  and  texture  features,  and  4)  classification  between  the  normal  and  the  abnormal 
regions  by  using  rule-based  and  LDA  classifiers.  The  block  diagram  for  the  CAD  system  is  shown  in  Figure  3. 

For  the  pre-screening  stage,  we  developed  a  new  prescreening  technique  in  which  gradient  field  analysis  was  combined 
with  Flessian  analysis  to  identify  mass  candidates.  Both  gradient  field  and  Flessian  analyses  were  designed  to  enhance 
circular  structures  on  mammograms  and  to  suppress  the  objects  with  other  shapes.  Gradient  field  analysis  used  the 
information  of  gradient  field  directions  and  Flessian  analysis  used  the  second  derivatives  by  solving  for  the  eigenvalues 
of  the  Flessian  matrix.  After  this  enhancement  filtering,  the  local  maxima  within  the  breast  region  were  identified  as 
the  mass  candidates  on  each  mammogram.  The  suspicious  structure  in  each  identified  location  was  initially  extracted 
by  a  seed-based  region  growing  method.  An  active  contour  method  was  then  used  to  further  refine  the  initial 
segmentation.  Morphological,  gray  level  histogram  and  tun-length  statistics  (RLS)  features  were  extracted  from  the 
original  region  of  interest  (ROI)  and  the  orientation  field  of  the  ROI  for  reduction  of  FPs. 


2.2.2  Training  and  test  CAD  system 

The  hold-out  method  was  used  for  training  and  testing  our  CAD  system.  We  randomly  separated  the  entire  data  set  by 
case  into  two  independent  subsets,  the  training  subset  including  78  cases  with  156  current  and  200  prior  mammograms 
and  the  test  subset  including  81  cases  with  162  current  and  202  prior  mammograms.  The  training  included  selection  of 
proper  parameters  and  features  for  the  classifier  in  the  CAD  system.  Once  the  training  was  completed,  the  parameters 
and  features  were  fixed  for  testing.  The  new  system  was  trained  by  using  prior  mammograms  in  the  training  set  only. 
The  performance  of  the  new  system  was  compared  with  that  of  the  previous  CAD  system  on  the  current  and  prior 
mammograms  in  the  test  set. 

During  training,  feature  selection  with  stepwise  LDA  was  employed  to  obtain  the  best  feature  subset  and  reduce  the 
dimensionality  of  the  feature  space  to  design  an  effective  classifier.  The  detailed  procedure  has  been  described 
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elsewhere13.  Briefly,  at  each  step  one  feature  was  entered  or  removed  from  the  feature  pool  by  analyzing  its  effect  on 
the  selection  criterion,  which  was  chosen  to  be  the  Wilks'  lambda  in  this  study.  Since  the  appropriate  threshold  values 
for  feature  entry,  feature  elimination,  and  tolerance  of  feature  correlation  were  unknown,  we  used  an  automated  simplex 
optimization  method  to  search  for  the  best  combination  of  thresholds  in  the  parameter  space.  The  simplex  algorithm 
used  a  leave -one-case-out  resampling  method  within  the  training  subset  to  select  features  and  estimate  the  weights  for 
the  LDA  classifier.  To  have  a  figure-of-merit  to  guide  feature  selection,  the  test  discriminant  scores  from  the  left-out 
cases  were  analyzed  using  receiver  operating  characteristic  (ROC)  methodology.  The  accuracy  for  classification  of 
masses  and  FPs  was  evaluated  as  the  area  under  the  ROC  curve,  Az.  In  this  approach,  feature  selection  was  performed 
without  the  left-out  case  so  that  the  test  performance  would  be  less  optimistically  biased.  However,  the  selected  feature 
set  in  each  leave-one-case-out  cycle  could  be  slightly  different  because  every  cycle  had  one  training  case  different  from 
the  other  cycles.  In  order  to  obtain  a  single  trained  classifier  to  apply  to  the  hold-out  test  subset,  a  final  stepwise  feature 
selection  was  performed  with  the  best  combination  of  thresholds,  found  in  the  simplex  optimization  procedure,  on  the 
entire  training  subset  to  obtain  the  final  set  of  features  and  estimate  the  weights  of  the  LDA.  Note  that  the  entire 
process  of  feature  selection  and  classifier  weight  estimation  was  performed  within  the  training  subset.  The  LDA 
classifier  with  the  selected  feature  set  was  then  fixed  and  applied  to  the  test  subset. 


2.2.3  Evaluation  methods 

We  used  a  free-response  receiver  operating  characteristic  (FROC)  method  to  assess  the  overall  performance  of  the  CAD 
scheme  on  this  image  set.  An  FROC  curve  was  obtained  by  plotting  the  mass  detection  sensitivity  as  a  function  of  FP 
marks  per  image  as  the  decision  threshold  on  the  LDA  classifier  scores  varied.  The  detected  individual  objects  were 
compared  with  the  “true”  mass  locations  marked  by  the  experienced  radiologist,  as  described  above.  A  detected  object 
was  labeled  as  TP  if  the  overlap  between  the  bounding  box  of  the  detected  object  and  the  bounding  box  of  the  true  mass 
relative  to  the  larger  of  the  two  bounding  boxes  was  over  25%.  Otherwise,  it  would  be  labeled  as  FP.  The  25% 
threshold  was  selected  as  described  in  our  previous  study14. 

FROC  curves  were  presented  on  a  per-image  and  a  per-case  basis.  For  image -based  FROC  analysis,  the  mass  on  each 
mammogram  was  considered  an  independent  true  object;  the  sensitivity  was  thus  calculated  relative  to  the  number  of 
visible  masses  by  image,  which  was  149  and  151,  respectively,  for  the  current  and  prior  test  subset.  For  case-based 
FROC  analysis,  the  same  mass  imaged  on  the  two-view  mammograms  was  considered  to  be  one  true  object  and 
detection  of  either  or  both  masses  on  the  two  views  was  considered  to  be  a  TP  detection;  the  sensitivity  was  thus 
calculated  relative  to  the  number  of  masses  by  case,  which  was  81  and  90,  respectively,  for  the  current  and  prior  test 
subset.  The  test  FROC  curve  for  a  given  mass  subset  was  estimated  by  counting  the  detected  masses  on  the  test  mass 
subset  for  the  sensitivity.  The  FP  marker  rate  was  estimated  from  FPs  detected  in  the  same  test  subset.  The  average 
number  of  FP  marks  per  image  produced  by  the  CAD  system  at  a  given  sensitivity  was  estimated  by  counting  the 
detected  objects  in  these  cases  at  the  corresponding  decision  threshold. 

In  order  to  compare  the  performance  of  our  CAD  systems  statistically,  we  employed  the  alternative  free-response  ROC 
(AFROC)  method15.  In  the  AFROC  method,  the  FROC  data  are  first  transformed  by  counting  the  number  of  false¬ 
positive  images  (FP1)  instead  of  the  FPs  per  image.  The  LDA  score  of  an  FPI  is  determined  by  the  FP  object  with  the 
highest  score  on  the  image  regardless  of  how  many  lower  scores  FP  objects  are  made  on  the  same  image.  The  ROCKIT 
curve  fitting  software  and  statistical  significance  tests  for  ROC  analysis  developed  by  Metz  et  al.  16  can  then  be  used  to 
analyze  the  AFROC  data. 


3.  EXPERIMENTAL  RESULTS 

Figures  4  and  5  showed  the  image-based  and  case-based  FROC  curves  for  detection  of  masses  on  prior  mammograms, 
respectively.  The  case-based  sensitivities  for  detection  of  masses  on  the  prior  mammograms  (typically  subtle  masses) 
in  the  test  subset  were  56%,  and  35%  at  1  and  0.5  FPs/image  by  using  the  new  CAD  system  in  comparison  to  48%,  and 
32%  at  the  same  FP  rates  by  using  the  previous  CAD  system.  The  improvement  with  the  new  system  on  prior 
mammograms  was  statistically  significant  (p  =  0.036).  When  the  new  system  was  applied  to  the  detection  of  masses 
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on  the  current  mammograms  (typically  average  masses)  in  the  test  subset,  the  case-based  sensitivities  were  77%  and 
70%  at  1  and  0.5  FPs/image  in  comparison  to  75%  and  56%  at  the  same  FP  rates  by  using  the  previous  CAD  system. 
The  difference  in  the  two  FROC  curves  for  detection  of  average  masses  on  current  mammograms  was  not  statistically 
different  (p  =  0.184).  Image-based  and  case -based  FROC  curves  for  detection  of  masses  on  current  mammograms 
were  shown  in  Figures  6  and  7,  respectively. 
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Figure  4.  Image-based  test  FROC  curves  on  prior 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  5.  Case-based  test  FROC  curves  on  prior 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  6.  Image-based  test  FROC  curves  on  current 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 


Figure  7.  Case-based  test  FROC  curves  on  current 
mammograms.  Old  CAD:  detection  by  the  previous 
CAD  system  trained  on  both  current  and  prior 
mammograms.  New  CAD:  detection  by  the  CAD 
system  trained  on  prior  mammograms. 
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Table  1.  Estimation  of  the  statistical  significance  in  the  difference  between  the  FROC  performances  of  the  previous  CAD 
system  trained  on  both  current  and  prior  mammograms  and  the  proposed  CAD  system  trained  on  prior  mammograms. 


A,  (AFROC) 

Current  Test  Set 

Prior  Test  Set 

Old  CAD 

0.51 

0.26 

New  CAD 

0.50 

0.31 

p-value 

0.184 

0.036 

4.  DISCUSSION  AND  CONCLUSIONS 

In  this  study,  we  improved  the  accuracy  of  a  CAD  system  for  detection  of  subtle  masses  on  prior  mammograms.  A 
new  prescreening  method  was  developed  to  improve  the  sensitivity  of  mass  detection.  A  new  mass  segmentation 
method  that  combined  a  seed-based  region  growing  method  with  active  contour  method  was  also  designed.  RLS 
features  were  extracted  from  the  original  ROIs  and  the  newly  derived  orientation  field  of  the  ROIs  for  FPs  reduction. 
Our  CAD  system  can  significantly  improve  the  performance  of  mass  detection  on  prior  mammograms  without  a  trade¬ 
off  in  the  detection  of  masses  on  current  mammograms.  It  is  expected  that  the  new  CAD  system  can  increase  the 
overall  accuracy  for  detection  of  subtle  early-stage  breast  cancers. 
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